Hierarchical Music Structure Analysis, Modeling and Resynthesis: A Dynamical Systems and Signal Processing Approach. by Victor Gabriel Adin B.M., Universidad Nacional Aut6noma de Mexico (2002) Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, in partial fulfillment of the requirements for the degree of Master of Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September, 2005 @ Massachusetts Institute of Technology 2005. All rights reserved. Author ................................................... Program in Media Arts and Sciences August 5, 2005 Certified by ............. Barry L. Vercoe Professor of Media Arts and Sciences Thesis Supervisor A ccepted by........... ................ Andrew B. Lippman Chairman, Departmental Committee on Graduate Students MASSACHUSETTS INSTTUTE OF TECHNOLOGY SEP 2 6 2005 UBRARIES IROTCH 2 Hierarchical Music Structure Analysis, Modeling and Resynthesis: A Dynamical Systems and Signal Processing Approach. by Victor Gabriel AdAn Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning, on August 5, 2005, in partial fulfillment of the requirements for the degree of Master of Science Abstract The problem of creating generative music systems has been approached in different ways, each guided by different goals, aesthetics, beliefs and biases. These generative systems can be divided into two categories: the first is an ad hoc definition of the generative algorithms, the second is based on the idea of modeling and generalizing from preexistent music for the subsequent generation of new pieces. Most inductive models developed in the past have been probabilistic, while the majority of the deductive approaches have been rule based, some of them with very strong assumptions about music. In addition, almost all models have been discrete, most probably influenced by the discontinuous nature of traditional music notation. We approach the problem of inductive modeling of high level musical structures from a dynamical systems and signal processing perspective, focusing on motion per se independently of particular musical systems or styles. The point of departure is the construction of a state space that represents geometrically the motion characteristics of music. We address ways in which this state space can be modeled deterministically, as well as ways in which it can be transformed to generate new musical structures. Thus, in contrast to previous approaches to inductive music structure modeling, our models are continuous and mainly deterministic. We also address the problem of extracting a hierarchical representation of music from the state space and how a hierarchical decomposition can become a second source of generalization. Thesis supervisor: Barry L. Vercoe, D.M.A. Title: Professor of Media Arts and Sciences Abstract 3 Thesis Committee Thesis supervisor .......... Barry L. Vercoe Professor of Media Arts and Sciences Massachusetts Institute of Technology Thesis reader ..................................................................... Rosalind Picard Professor of Media Arts and Sciences Massachusetts Institute of Technology Thesis reader. Thesis Committee ............. Robert Rowe Professor New York University 5 6 Acknowledgments I would me into creative his kind first of all like to thank my advisor, Barry Vercoe, for taking his research group and providing me with a space of complete freedom. None of this work would have been possible without support. Many thanks to the people who took the time to read the drafts of this work and gave me invaluable criticisms, particularly my readers: Professors Rosalind Picard, Robert Rowe and again Barry Vercoe. Many special thanks go to the other Music, Mind and Machine group members: Judy Brown, Wei Chai, John Harrison, Nyssim Lefford, and Brian Whitman for being such incredible sources of inspiration and knowledge. Special thanks to Wei for her constant willingness to listen to my math questions and for her patience. To Brian for being my second "advisor". His casual, almost accidental lessons gave me the widest map of the music-machine learning territory I could have wished for. Thanks to John for being a tremendous "officemate" and friend. His intelligence, skepticism and passion for music and technology made him a real pleasure to work with and a constant source or inspiration. Many other Media Lab members deserve special mention as well: Thanks to Hugo Solis for helping me with my transition into the lab, doing everything from paper work to housing. I probably wouldn't have come here without his help. Also thanks to all 45 Banks people for being such wonderful friends and roommates: Tristan Jehan, Luke Ouko, Carla Gomez and again, Hugo. In addition I have to thank all the authors cited in this text. This thesis wouldn't have been without their work. I want to give special thanks to Bernd Schoner for his beautiful MIT theses. They were a principal source of inspiration and insight for the present work. Acknowledgments My greatest gratitude to my family. Their examples of perseverance and faith dfsfdhave been my wings and strength. Last but certainly not least, I want to thank my dear wife "Cosi" for her love and support (and her editing skills). She is the most important rediscovery I have madfde in my search for the meaning of it all. 8 Acknowledgments Table of Contents 11 1 Introduction and Background 1.1 1.2 1.3 . . . . 11 12 13 14 Our Approach to Music Structure Modeling . . . . 18 M otivations . . . . . . . . . . . . . . . . . . . . . . . . . Generative Music Systems and Algorithmic Composition M odeling . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Models of Music . . . . . . . . . . . . . . . . . . 1.3.2 2 Musical Representation 21 3 State Space Reconstruction and Modeling 25 3.1 3.2 State Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Deterministic Dynamical Systems . . . . . . . . . . 3.2.2 State Space Reconstruction . . . . . . . . . . . . . 3.2.3 Calculating the Appropriate Embedding Dimension 25 26 27 28 29 Calculating the Time Lag r . . . . . . . . . . . . . 31 3.2.4 3.3 M odeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.1 Spatial Interpolation = Temporal Extrapolation . 33 43 4 Hierarchical Signal Decomposition 4.1 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 . 44 . 44 . 45 . 49 . 54 . 55 Structure Modeling and Resynthesis . . Interpolation . . . . . . . . . . . . . . . Abstracting Dynamics from States . . . Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Divide and Conquer . . . . . . . . . . . . . . 4.1.1 Fourier Transform . . . . . . . . . . . 4.1.2 Short Time Fourier Transform (STFT) 4.1.3 Wavelet Transform . . . . . . . . . . . 4.1.4 Data Specific Basis Functions . . . . . 4.1.5 Principal Component Analysis (PCA) Sum m ary . . . . . . . . . . . . . . . . . . . . 57 5 Musical Applications 5.1 Music 5.1.1 5.1.2 5.1.3 Contents 58 59 62 66 9 5.2 5.3 Outside-Time Structures . . . . . . . . . . . . . . . . . . . 5.2.1 Sieve Estimation and Fitting . . . . . . . . . . . . Musical Chimaeras: Combining Multiple Spaces . . . . . . 6 Experiments and Conclusions 6.1 6.2 Experim ents . . . . . . . . . . . . . . . 6.1.1 Merging Two Pieces . . . . . . Conclusions and Future Work . . . . . 6.2.1 System's Strengths, Limitations finements . . . . . . . . . . . . 79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Possible Re. . . . . . . . . . . 79 . 79 . 91 . 92 Appendix A Notation 95 Appendix B Pianorolls 97 B .1 B .2 B.3 B.4 B.5 B.6 Etude 4 . . . . . . . . . . . . . . . . . . . . . . Etude 6 . . . . . . . . . . . . . . . . . . . . . . Interpolation: Etude 4 . . . . . . . . . . . . . . Combination of Etudes 4 and 6, Method 1 . . . Combination of Etudes 4 and 6, Method 2 . . . Combination of Etudes 4 and 6, Method 3 with nents Modeled Jointly . . . . . . . . . . . . . . B.7 Combination of Etudes 4 and 6, Method 3 with nents Modeled Independently . . . . . . . . . . Appendix C Bibliography 10 69 69 70 Experiment Forms . . . . . . 97 . . . . . . 99 . . . . . . 101 . . . . . . 109 . . . . . . 119 Compo. . . . . . 129 Compo. . . . . . 139 149 153 Contents CHAPTER ONE Introduction and Background 1.1 Motivations There are two motivations for the present work: the first motivation comes from my interest in music analysis. Several music theories have been developed as useful tools for analyzing and characterizing music. Most of these theories, such as Riemann's theory of tonal music, Schenkerian analysis [31] and Forte's atonal theory, are specific to particular styles or systems of composition. It would be interesting to see the development of an analytical method useful for any kind of music. This method would have to be based on features that are present in all music, the element of motion being the only constant characteristic. Thus, my interest in finding a more general method of musical analysis applicable for a variety of music has motivated me to classify music in terms of the different kinds of motion it manifests, encouraging me to define a taxonomy of motion. While the importance of motion in music is obvious, no work that I know of has systematically studied music from this perspective. The second motivation comes from my interest in understanding the behavior of my musical imagination. The process of composing usually includes writing or recording the imagined sound evolutions as they are being heard in one's mind. Multiple paths and combinations are explored during the process of imagining new music, so that the score we ultimately write is but an instance of the multiple combinations explored in one's mind. It is also a simplification of the imagined music because the representation we use to record the imagined sound evolutions is incomplete. In other words, we are unable to represent accurately every detail of the imagined universe. Thus, this representation is like a photograph of the dynamic, ever-changing world of the imaginary. Why decide on one sequence or combination over another? Is there a single best sequence, a better architecture? My personal answer to this question is no. This has led me to become interested in pursuing the creation of a meta-music: a system that generates the fixed notated music along with all its other implied possibilities. 1.2 Generative Music Systems and Algorithmic Composition Algorithmic composition can be broadly defined as the creation of algorithms, or automata in general, designed for the automatic generation of music. 1 In other words, in algorithmic composition the composer does not decide on the musical events directly but creates an intermediary that will choose the specific musical events for him. This intermediary can be a mechanical automaton or a mathematical description that constrains the possible musical events that can be generated. The definition here is deliberately broad to suggest the continuous spectrum that exists in the levels of detachment between what the composer creates and the actual sounds produced. 2 Every algorithmic composition approach can be placed in a continuum between ad hoc design and music modeling. Ad hoc designs are those where the composer invents or borrows algorithms with no particular music in mind. The composer doesn't necessarily have a mental image of what the musical output will be. The approach is something like: "Here's an algorithm, and I wonder what this would sound like." In music modeling, the composer deliberately attempts to generalize the music he hears in his mind, so the algorithm is a deliberate codification of some existing music. We find examples of ad hoc approaches as early as 1029 in music theorist Guido D'Arezzo's Micrologus. Guido discusses a method for automatically composing melodies using any text by assigning each note in the pitch scale to a vowel [26]. Because there are more pitches than vowels, the composer is still free to choose between the multiple pitch options available. Similarly, Miranda borrows preexisting algorithms from cellular automata and fractal theory for automatic music generation [28]. 'For a complete definition of algorithm and a discussion of how they relate to music composition, see [26]. 2 One could argue that any composition is algorithmic since the composer does not define the specific wave-pressure changes, only the mechanisms to produce them. 12 Introduction and Background While inventing ad hoc algorithms for music composition is a fascinating endeavor, in this work we are interested in learning about our intuitive musical creativity and developing a generative system that grows from musical examples. Thus, we discuss the design of a generative music system based on modeling existent music. 1.3 Modeling All models are wrong, but some are useful. George E.P. Box There are no best models per se. The most effective model will depend on the application and goal. Different goals suggest different approaches to modeling. As expressed by our motivations, our models intend to serve a double purpose: the first is to obtain some understanding about the inner workings of a given piece and, hopefully, gain insight into the composer's mind. The second is for the models to be a powerful composition tool. It is difficult, if not impossible, to come up with a model that achieves these two goals simultaneously for a variety of pieces because a generative model might not be the most adequate for analysis and vise versa. Music structure is so varied, so diverse, that it seems unlikely that a single modeling approach could be used successfully for all music and for all purposes. What are the criteria for choosing a model? Is the model simple? The Minimum Description Length principle [33], which essentially defines the best model as that which is smallest with regards to both form and parameter values, is a measure of such a criterion. Other criteria to consider are: Robustness: Does it lend itself well to a variety of data (e.g. musical pieces)? Prediction: Can it accurately predict short term or long term events? Insight: Does it provide new meaningful information about the data? Flexibility: As a generative system, what is the range or variety of new data that the model can generate? 1.3 Modeling 13 1.3.1 Models of Music Deduction vs. Induction Brooks et al. describe two contrasting approaches to machine modeling: the inductive and the deductive [4]. Essentially, the difference lies in who performs the analysis and the generalization: the human programmer or the machine. In a deductive model, we analyze a piece of music and draw some rules and generalizations from it. We then code the rules and generalizations and the machine deduces the details to generate new examples. In an inductive approach, the machine does the generalization. Given a piece (or set of pieces) of music, the machine analyzes and learns from the example(s) to later generate novel pieces. While many pieces may share common features, each piece of music has its own particular structure and "logic". A deductive approach implies that one must derive the general constants and particularities of a piece or set of pieces for the subsequent induction by the machine. This is a time-consuming task that could only be done for a small set of pieces before one's life ended. The classic music analysis paradigm is at the root of this approach. One can certainly learn a lot about music in this way, but it seems to us that attempting to have the machine automatically derive the structure and the generalization is not only a more interesting and challenging problem, it also might shed light about the way we learn and about human cognition in general. This approach also encourages one to have a more general view regarding music and to be as unbiased as possible (hopefully changing our own views and biases in the process). In a deductive approach we are filtering the data. We are telling the machine how to think about music and how to process it. In an inductive approach the attempt is to have the machine figure out what music is. We could alternatively attempt to model our own creativity directly, but it seems to us that the "logic" or structure and generative complexity of the highly subconscious and hardly predictable creative mind is at a far reach from our conscious self probing and introspection. 3 Rather than asking ourselves what might be going on in our mind while we imagine a new piece of music and trying to formalize the creative process, we can let our imagination free, without probing, and then have the machine analyze and model the created object. 3 This nebulous and partly irrational experience of the imaginary has been expressed many times by different composers, for example [34, 15]. 14 Introduction and Background Continuous vs. Discontinuous Most generative music systems and analysis methods assume a discrete 4 (and most times finite) musical space. This is a natural assumption since the dominating musical components in western music have been represented with symbols for discrete values. 5 The implication is that most models of music depart from the idea that a piece is a sequence of symbols taken from a finite alphabet. Almost all the generative systems we are aware of are discrete ([21] [4] [19] [38][12] [8][9][41] [7][32] [30] [29][3]). This thesis, however, proposes a continuous approach to modeling music structure. Deterministic vs. Probabilistic There is no way of proving the correctness of the position of 'determinism' or 'indeterminism'. Only if science were complete or demonstrably impossible could we decide such questions. Mach (Knowledge and Error, Chapt XVI.11) Should a music structure model be deterministic or stochastic? Consider two contrasting musical examples: on one extreme there is Steve Reich's Piano Phase. The whole piece consists of two identical periodic patterns with slightly different tempi (or frequencies).6 Piano Phase can be straightforwardly understood as a simple linear deterministic stationary system. On the other extreme we can place Xenakis' string quartet ST/4, 1-080262. It would make sense to model ST/4, 1-080262 stochastically since we know it was composed in this way! In between these two extremes there are a huge variety of compositions with much more complex and intricate structures. There may be pieces that start with clearly perceivable repeating patterns and that gradually evolve into something apparently disorganized. A piece like this might best be modeled as a combination of deterministic and stochastic components that are a func4 Since the classical period, notation for loudness has commonly included symbols for continuous transitions, but loudness is typically not considered important enough to be studied. If it is, its continuous nature is typically ignored. 5 Indeed, the recurrent inclusion of the continuum in notated pitch space (starting probably with Bartok, continuing with Varese and Xenakis, and culminating in Estrada) has made it difficult or impossible for some music theoretic views to approach these kinds of music. 6 The point of the piece is the perception of the evolution of the changing phase relationships between the two patterns. 1.3 Modeling 15 tion of time. In addition, these combinations may occur at multiple time scales, in which case it would be better modeled as a combination of a deterministic component at one level, and a stochastic component at another. Thus, rather than trying to find a single global model for a whole piece, we might want to model a piece as a collection of multiple, possibly different, models. Brief Thoughts on Markov Models It is intriguing to see how the great majority of the inductive machine models of music are discrete Markov models. Remember that an kth order Markov model of a time series is a probabilistic model where the probability of a value at time step n + 1 in a sequence s is given by the k previous values: p(sn+1IP(sn,Sn-1, .. , Sn-k+1). Why discrete and why Markovian? Several papers on Markov models of music are based on the problem of reducing the size and complexity of the conditional probability tables by the use of trees and variable orders([41] [3] [30]). Again, the discrete nature of the models most probably comes from the view of music as a sequence of discrete symbols taken from an alphabet. Why not model the probability density functions parametrically, as with mixtures of gaussians? Are Markov models really good inductive models of music? What kind of generalization can they make? Most applications of discrete Markov models estimate the probability functions from the training data. Usually, zero probabilities are assigned to unobserved sequences. If this is the case, it is impossible for a simple kth order Markov model to generate any new sequences of length k + 1. All new and original sequences will have to be of length k +2 or greater. Take for example the following simple sequence which we assume to be infinite: 1, 2,3,2,1,2,3, 2,1,2,3,2,... (1.1) A first order model of this sequence can be constructed statistically by counting the relative frequencies of each value and of each pair of values. The relative frequencies of all possible pairs derived from this sequence can be easily visualized in the following table, where each fraction in the table represents the number of times a row value is immediately followed by a column value, divided by the total number of consecutive value or sample pairs found in the sequence. From these statistics we can now derive the marginal probabilities of individual values, and by Bayes' rule 16 Introduction and Background Table 1.1: Relative frequency of all pairs of values found in sequence 1.1 1 1 0 2 1/4 0 3 2 1/4 0 1/4 3 0 1/4 0 we find the conditional probabilities: P(Sn+1ls ) = P(Sn+1, Sn) p(Sn) From this table we can see that new sequences generated by this joint probability function will never produce pairs (1,1), (1,3), (3,1), (3,3), (2,2) or extrapolate to other values, such as (4,5). In other words, all length 2 sequences that this model can generate are strictly subsets of sequence 1.1, while those of length 3 or greater may or may not appear in the original sequence. To overcome this limitation one could give unobserved sequences a probability greater than zero, but this is essentially adding noise. Looking at generated sequence segments of length 3 or greater we may now wonder how these are related to the original training pieces. The limitation of this model soon becomes apparent. New sequences generated from this model will have some structural resemblance to the original only at lengths 1 < k + 1 (1 < 2 in this example), but not at greater lengths. From the first order model in the present example it is possible to obtain the following sequence: 1, 2,3,2, 3, 2,1,2,3,2,1,2,1,2,1,2,1,2,3,2,3,2, 1, 2,3,2,1, ... Because p(3 12 ) = p(112 ) = 0.5, the generated sequence will fluctuate between 1 and 3 with equal probabilities every time a 2 appears. This misses what seems to be the essential quality of the sequence: its periodicity. Increasing the order of the model to 2, we are able to capture the periodicity unambiguously. But what generalization exists? Now the model will generate the whole original training sequence exactly. This is the key idea we explore in this work. We want the model to be able to abstract the periodicity and generate new sequences that are different from the original but that have the same essentially periodic quality. This information can be obtained by looking for structure and patterns in the probability functions derived from the statistics, but in this work we will present a different approach. 1.3 Modeling 17 One more problem is that of stationarity. In our example the sequence repeats indefinitely, making a static probabilistic model adequate. But music continuously evolves and is very seldomly static. Sequences generated with a simple model like the one discussed above may resemble the original at short time scales, but the large scale structures will be lost. This problem could be addressed by dynamically changing the probability functions, i.e. with a stochastic model. Modeling non-stationary signals can also be done through a hierarchical representation, where the conditional probabilities are estimated not only sequentially, but also vertically across hierarchies. Some concrete applications of this approach can be found in image processing [10], and to our knowledge, there have not been similar approaches in music modeling. Similar problems have been observed using other methods such as neural networks [29], where the output sequences resembled the original training sequence only locally due to the note-to-note approach to modeling. The Markov model example we have given here is certainly extremely simplistic, but hopefully it makes clear some of the problems of this approach to music modeling for the purpose of generating novel pieces. 1.3.2 Our Approach to Music Structure Modeling As suggested in our motivations, our main interest is the abstraction of the essential qualities of the dynamic properties of music and their use as sources for the generation of novel pieces. By dynamics we mean the qualitative aspects of motion in music, rather than the loudness component as is usually used by musicians. Here we approach musical sequences in a similar way as a physicists would approach the motion of physical objects. The notion of the dynamical properties of a musical sequences is rather abstract, and we hope to clarify it with a concrete example. First, consider the following question: why can we recognize a tune even when replacing its pitch scale (e.g. changing from major to minor), or when the pitch sequence is inverted? What are the constants that are preserved that allow us to recognize the tune? Take for example the first 64 notes (8 measures) of Bach's Prelude from his Cello Suite no.1 (Figure 1-1). There are multiple elements that we can identify in the series: the pitches, the durations, the scales and harmonies the pitches imply, the pitch intervals and the temporal relationships between these elements. How can we characterize these temporal 18 Introduction and Background Prelude. Is - Is M AO,"-. -0 3 PAP~~~F A A f fA FS 7 60 58 56 54 52 50 48 46 44 NoteNumber Figure 1-1: Top: First 8 measures of Bach's Prelude from Cello Suite no.1. Bottom: Alternative notation of the same 8 measures of Bach's Prelude. Here the note sequence is plotted as connected dots to make the contour and the cyclic structure of the sequence more apparent. 1.3 Modeling 19 relationships? Can we say something about the regularity/irregularity of the sequence, the changes of velocity, the contour? An abstract representation of the motion in Bach's Prelude might look something like (up, up, down, up, down, up, down, down, ... ) repeating several times. This is a useful but coarse approximation, preserving only the ordering of intervals. We might also want to preserve some information about the size of the intervals, their durations and the quality of the motion from one point to the next: is it continuous, is it discrete? Whatever the case, we see from Figure 1-1 (Bottom) that this general sequence is the only structure in the Prelude fragment, repeating and transforming gradually to match some harmonic sieve that changes every two measures. Every two measures the sequence is slightly different, but in essence the type of motion of the eight measures is the same. Thus, we can consider the sequence as being composed of two independent structures: the dynamics and the sieves though which these are filtered or quantized. In this thesis we focus on the analysis and modeling of these dynamic properties. We address ways in which we can decompose, transform, represent and generalize the dynamics for the purpose of generating new ones. Rather than trying to extend on the Markovian model as suggested above, we approach the problem from a deterministic perspective. We explore the use of a method that allows for a geometric representation of time series, and discuss different ways to model and generalize the geometry deterministically. 20 Introduction and Background CHAPTER TWO Musical Representation Structure in music spans about seven orders of magnitude, from approxto about 1, 000 seconds (~ 17 min.) [11]. 1 imately 0.0001 seconds ( Oz) the upper four orders (between 0.1 secs. on we focus In the present work and 1,000 secs.) for two reasons: One, these orders correspond to those typically represented in musical scores. Second, there is a clear difference between the way we perceive sound below and above approximately 0.02Hz. Below this threshold, we hear pitch and timbre, and above it we hear rhythm and sound events or streams. The perception can be so clearly different that they feel like two totally independent things: a high-level control signal driving high-speed pressure changes. These high-level control signals are what we are interested in modeling and transforming. Thus, the musical representations we will use are not representations of the actual sound, but abstract representations of these high-level signals. As of today, it is still very difficult to extract these high level control signals from the actual audio signal. Good progress has been made in tracking the pitch of individual monophonic instruments and in some special cases from polyphonic textures. But the technology is still far from achieving the audio segregation we would like. Therefore, except for simple cases where the evolution of pitch, loudness and some timbral features can be relatively well extracted (such as simple monophonic pieces), our point of departure is the musical score. Two basic types of scores exist. In the first, the score represents the evolution of perceptual components, such as pitch, loudness, timbre, etc. In the second, traditionally called tablature notation, the score is a representation of the performance techniques required to obtain a particular sound. In essence they are both control signals driving some salient feature of sound or some mechanism for its production. There are several ways in which these high level control signals can be represented before being modeled and transformed. Some representations will be more appropriate than others depending on their use and the type of music to be represented. Representation of Time The simplest and most common way of representing a digitized scalar signal is as a succession of values at equal time intervals: s[n] = so, si, ... , (2.1) sn Since it is assumed that all samples share the same duration, this information need not be included in the series. The same is true for a multidimensional signal where each dimension describes the evolution of a musical component. For example, a series describing the evolution of pitch, loudness and brightness might look like this: s[n]= Po, lo, bo, P1, ii, bl, .- ,- Pn ... , in ...,I bn (2.2) For the specific case where there is more than one instrument or sound source, as in a three voice fugue, a chorale or even possibly an entire orchestra, the series can again be extended as: Pa,0, Pa,, . -- , 1a,,7 ia,1, -. - I la,n ba,o, ba,1, .. , ba,n - - -, Pb,n Pb,0Pb,b,1, s[n] = lb,O, 1b,1, . . . , lb,n bb,o, bb,1, - -, n Pm, _ Pa,n Pm,1, --- , (2.3) Pm,n 1m,0, lm,1, . -- , m,n bm,o, bm,1, ... , bm,n While this is a simple representation scheme, it is not necessarily the best in all cases. The problem with this representation has to do primarily 22 Musical Representation with the temporal overlapping of events. Consider a polyphonic instrument such as the piano. With this instrument it is possible to articulate multiple notes at the same time. How should we represent a sequence of pitches that overlap and start and end at different times? An alternative is to have a series that has as many dimensions as the number or keys in the piano, where each dimension represents the evolution of the velocity of the attack and release of each key: s[n] . V20 _ Vm,0, (2.4) V2,1= Vm,1, ... , Vm,n . How could we reduce the number of dimensions and still have a meaningful representation? A tempting idea might be to separate a piano piece into multiple monophonic voices and to assign each voice to a dimension in a multidimensional polymelodic series as in 2.3. But in addition to being an arbitrary decision in most cases (not all piano pieces are conceived as a counterpoint of melodies), the main problem is that even in single melodic lines there may still be overlaping notes through legatissimo articulation. An alternative representation is the Standard Midi File (SMF) approach, where time is included explicitly in the representation by indicating the absolute position of each event in time or the time difference: the Inter Onset Interval (101). In addition to this time information there are the duration of each event and the component values. The most economical form of this representation, Format 0, combines all separate sources or instruments into a few dimensions, one of which specifies the instrument to which the event corresponds. The following is an example of this format, where ioi stands for Inter Onset Interval, d for duration, p for pitch, v for velocity (or loudness) and ch for channel (or instrument): zozO, s[n] = L z ... , zozn n pn d0, p0, d1, pi, - - Vo, V1, -... , cho, chi, ... , ... , (2.5) Vn chn . With this format we can now represent overlapping events with only a few dimensions. This is an adequate representation for the piano, due to the discrete nature and limited control possibilities of the instrument. Yet, it is far from being a good numeric replacement for a traditional music score in general, the main problem being the loss of independence between the multiple parameters. Because the explicit representation of time intervals can be useful, we can still take advantage of this feature by defining pairs of dimensions for each parameter: one for the parameter values and another for their durations. Besides providing some insight into the rhythmic structure of a sequence, it can also significantly reduce the number of data samples: Po, Pi, ... , d0 , d1 , ... , Pn n s [n] = (2.6) vo, _ d0 , V1, ... d1 , . .. vn , , dn _ While this representation lends itself best for discrete data, we can assume some kind of interpolation between key points and still make it useful for continuous data. Summary There are multiple ways in which musical data can be represented. Different representations have different properties: some provide compact representations, while others are more flexible. In addition, some may be more informative about certain aspects of the music than others. The choice of representation will also depend on the type of transformation we are interested in applying to the data. In the present work we use these three basic representations (constant sampling rate representation, variable sampling rate representation, and SMF Format 0) in different situations for the purpose of analysing and generating new musical sequences. 24 Musical Representation CHAPTER THREE State Space Reconstruction and Modeling 3.1 State Spaces One of the most important concepts in this work is that of state space (or phase-space). A state space is the set of all possible states or configurations available to a system. For example, a six sided die can be in one of six possible states, where each state corresponds to a face of the die. Here the state space is the set of all six faces. As a musical example consider a piano keyboard with 88 keys. All possible combinations of chords (composed from 1 to 88 keys) in this keyboard constitute the state space, and each chord is a state. These two examples constitute finite state spaces because they have a finite number of states. Some systems may have an infinite number of states; for example a rotating height adjustable chair. A chair that can rotate on its axis and that can be raised or lowered continuously has an infinite number of possible positions. Yet, in practical terms, many of these positions are so close to each another that sometimes it makes sense to discretize the space and make it finite (for example, by grouping all rotations within a range of 7r radians into one state). While this system has an infinite number of possible states, it has only two degrees of freedom: up-down motion and azimuth rotation. State spaces can be represented geometrically as multidimensional spaces where each point corresponds to one and only one state of the system. In the chair example, since only two degrees of freedom exist, we can represent the whole state space in a two dimensional plane. Yet, we might like to keep the relations of proximity between states in the representation as well. Therefore we embed the two dimensional state space of the rotating chair in a three dimensional space in such a way that the states that are physically close to each other are also close in state space. This results in a cylinder, and is a "natural" way of configuring the points in the state space of the chair. An analogous musical example is that of pitch space. This one-dimensional space can be represented with a line, ranging from the lowest perceivable pitch to the highest. But this straight line is not a good representation of human perception of pitch in terms of similarity. Certain frequency ratios like the octave (2:1) are perceived to be equivalent or more closely related to each other than, for example, minor seconds. In 1855 Drobisch proposed representing pitch space as a helix, placing octaves closer to each other than perceptually more distant intervals [35]. Thus, we have a similar case to that of the chair, where a low dimensional space is embedded in a higher dimensional euclidean space to represent similar states by proximity (Figure 3-1). g# g# Figure 3-1: Helical representation of pitch space. A one-dimensional line is embedded in a three-dimensional euclidean space for a better representation of pitch perception. 3.2 Dynamical Systems We have discussed the notion of state space and have given a few examples of their representation. But for the case of music, it is the temporal relationships between states that we are most interested in (i.e. their dynamics), rather than the states themselves. Given this interest, it comes 26 State Space Reconstruction and Modeling as no surprise that our first approach to studying the qualitative characteristics of motion in music comes from the long tradition of the study of dynamical systems in physics. We will not discuss here the history of this vast discipline, but only present some key ideas and important results that will help us analyze, model and generalize musical structures from which we will hopefully generate novel pieces of music. 3.2.1 Deterministic Dynamical Systems We ought then to regard the present state of the universe as the effect of its anterior state and as the cause of the one which is to follow. Given for one instant an intelligence which could comprehend all the forces by which nature is animated and the respective situation of the beings who compose it -an intelligence sufficiently vast to submit these data to analysis- it would embrace in the same formula the movements of the greatest bodies of the universe and those of the lightest atom; for it, nothing would be uncertain and the future, as the past, would be present to its eyes. Pierre-Simon Laplace (Philosophical Essay on Probabilities, II. Concerning Probability) A deterministic dynamical system is one where all future states are known with absolute certainty given a state at an instant in time. If the future state of a system depends only on the present state, independently of the time at which the state is found, then the system is said to be autonomous. Thus, the dynamics of an autonomous system are defined only in terms of the states themselves and not in terms of time. In the case of an m-dimensional state space, the dynamics of an autonomous deterministic system can be defined as a set of m first-order ordinary differential equations for the continuous case (also called a flow) [22]: - x(t) = f(x(t)), dt t ER (3.1) or as an m-dimensional map for the discrete case: xn+1 3.2 Dynamical Systems = F(xn), n E Z (3.2) 27 3.2.2 State Space Reconstruction What if we know nothing about the nature of a system, and all we have access to is one of its observables? Consider for example the flame of a candle as our system of interest. Suppose that, similarly to Plato's cave of shadows from his Republic, Book VII, we do not see the candle directly, but only the motion of shadows of objects projected on the wall. Thus, we have a limited amount of information about the candle's motion and all its degrees of freedom. Can we infer the motion of the candle's flame in its entirety from the motion of the shadows? Suppose now that our observation is a piece of music; it is the shadow moving on the wall. We now want to know what the nature of the "system" that generated the given piece of music is. Can we automatically infer the hidden structure of the mind responsible for the generation of the piece and, from this structure, generate other musical possibilities? Evidently, in the context of this work, this example should not be taken literally, but you get the idea. We now state the question more formally. Given a series of observations s(t) = h(w(t)) that are a function of the system W, is it possible to reconstruct the dynamics of the state space of the system from these observations? Can we find h 1 (s(t))? If s(t) is a scalar observation sequence, is it possible to recover the high dimensional state space that generated the observation? Takens' theorem [40] states that it is possible to reconstruct a space X E Rm that is a smooth differomorfism (topologically equivalent) of the true state space W E Rd by the method of delays. This method consists of constructing a high dimensional embedding by taking multiple delays of s(t) and assigning each delay to a dimension in the higher dimensional space X: x (t) =(s(t), = s(t - -r), . . . , s(t (h(w(t)), h(w(t - - (m - 1)r)) T)),... , h(w(t - (m - 1)T)) H(w(t)) The theorem states that this is true if the following conditions are met: 1. System W is continuous with a compact invariant smooth manifold A of dimension d, such that A contains only a finite number of equilibria, contains no periodic orbits of period T or 2 T and contains only a finite number of periodic orbits of period pT, with 3 < p < M. 28 State Space Reconstruction and Modeling 2. m>2d+1. 3. The measurement function h(w(t)) is C2. 4. The measurement function couples all degrees of freedom. The condition that m > 2d + 1 guarantees that the embedded manifold does not intersect itself. In many cases a value of m smaller than 2d + 1 will work because the number of effective degrees of freedom of the system may be smaller due to dissipation or correlation between the dimensions in state space [37]. 3.2.3 Calculating the Appropriate Embedding Dimension How do we decide on a good dimension m for the embedding? How do we know if the dimensionality we've chosen for the embedding is high enough to capture all the degrees of freedom of the system? For deterministic time series we want each state to have a unique velocity associated with it, i.e. if Xn = Xk then F(x,) = F(Xk). This implies that for the case of continuous time series, trajectories should not cross in state space. In other words, points that are arbitrarily close to each other, both in space and time, will have very similar velocities. Correlation Sum A simple way of estimating a good embedding dimension for state space reconstruction of deterministic systems is to find points that overlap or that are closer than Eto each other. The correlation sum simply counts the number of distances smaller than 6 between all pairs of points (xi, x 3 ). 2 N-1 N (E- ||xi -xI|) C(6) = N(N -) (3.3) i=1 j=i+1 Here E(x) is the Heaviside function, which returns 1 if x > 0 and 0 if x ; 0. Throughout the text we use the Euclidean distance metric: 2 2 . +...+(m-m) iy) |x - y11 = A good embedding dimension will be one where the correlation sum is very small in proportion to the total number of points. Because it is not clear what a good choice of Eis, the correlation sum is evaluated for different values of E and different embedding dimensions m. Specifically 3.2 Dynamical Systems 29 for discrete series, with e smaller than the quantization resolution 6 we can count the number of points that fall in the same place. Thus, when we want a strictly bijective relation between the embedded trajectory and the observation sequence, we choose the smallest embedding dimension where the correlation sum equals zero, for e < J. i.e. C(e < 6) = 0. False Nearest Neighbors A more sophisticated method for estimating the embedding dimension is the false nearest neighbors method [23]. This algorithm consists of comparing the distance between each point and its nearest neighbor in an m-dimensional lag space to the distance of the same pair of points in an m + 1-dimensional space. If the distance between the pair of points grows beyond a certain threshold after changing the dimensionality of the space, then we have false nearest neighors. Kantz [22] gives the following equation for their estimation: Eanln a Xfnn (r) =M (m+1) (m1 ) ) ( >1 - _ rr) ~) r -|| -x1) Zain )1 ( )1(m ) m> (3.4) Where x(m) is the closest neighbor of xm in m dimensions and o is the standard deviation of the data. The first Heaviside function in the numerator counts the neighbors whose distance grows beyond the threshold r, while the second one discards those whose distance was already greater than -/r. Cao [5] proposed a more elegant measure of false nearest neighbors. He further eliminates threshold value r by considering only the average of all the changes in distance between all points as the dimension increases, and then taking the ratio of these averages. He defines the mean E[m] = all n 1) ||Xm +1 IIx~m) Xn (m+1) (3.5) - (1)n| and E1(m) = E[m + 1]/E[m] as the ratio between the averages of consecutive dimensions. The ratio function E1(m) describes a curve that grows slower as m increases, asymptotically reaching 1. The value of m where the growth of the curve is very small is the embedding dimension 30 State Space Reconstruction and Modeling we are looking for.' Typically the value selected is slightly above the knee of the curve (Figure 3-2). 0.90.8- 0.7W 0.6- 0.50.4- 0.3 2 4 6 8 12 10 Dimension (m) 14 16 18 20 Figure 3-2: Example of a typical behavior of E1(m). The function reaches 1 asymptotically, and a good choice for the embedding dimension is above the knee of the curve, where its velocity decreases substantially. 3.2.4 Calculating the Time Lag T While Taken's theorem tells us what the minimum embedding dimension for state space reconstruction should be, it says nothing about how to choose the delay T for the embedding. In theory, the choice of T is irrelevant since the reconstructed state space is topologically equivalent to the system's manifold. In practice, though, factors such as observation noise and quantization make it necessary to find a good choice of T for model estimation and prediction. To better understand what a good choice of r would be, we take as example one of the most popular systems in nonlinear dynamics: the 'This assumes that the time series comes from an attractor. If the series is noise for example, El(m) will never converge. 3.2 Dynamical Systems 31 Lorenz attractor. The dynamics of the system are defined as = (y - x) y px - y-xz z -#3z+xy Figure 3-3 shows 4000 points of the trajectory of this attractor, with parameters o = -10, p = 28, # = -2.666..., and initial position x = 0, y = 0.01 and z = -0.01. Suppose our only observable is the x axis of the system (Figure 3-4). We construct the lag space from this observable as x,, (Xn, xn-r, Xn2T). Figure 3-5 shows the reconstructed 50 40, 30 10, 0 -10 50 0 y -50 -20 0 -10 10 20 x Figure 3-3: Lorenz attractor with parameters o= -10, p 28, #3 -2.666.... state space from x for different values of T. For TF 1, all the points in the reconstructed space fall along the diagonal. T = 1 is clearly small because the coordinates that compose each point are highly correlated and almost identical. As T is increased, the embedding unfolds to reveal the structure of the attractor, which is clearest for values of -r between 8 and 14. As T continues to increase, the geometry gradually distorts to an overcomplicated structure. While visual inspection is a good way of estimating the delay parameter, in many cases the dimensionality of the state space must be greater than three (apart from the fact that we ignore the shape of the real state spaces that generated the observation). Thus, a quantitative measure to estimate the appropriate value of T is required. The method we will use is based on the above observation about the correlation that exists between coordinates in the reconstructed state 32 State Space Reconstruction and Modeling Figure 3-4: Evolution of the x dimension (our observation) of the Lorenz system. The abscissa shows the sample steps of the sequences while the ordinate shows the actual value of the x axis of the Lorenz attractor. space. If coordinates are highly correlated, then the reconstructed trajectory will fall along the main diagonal of the space. This means that we want to find a r large enough to make the correlations minimal yet small enough to avoid distortion of the reconstructed trajectory. A measure of the mutual information between a series and itself delayed by T provides us with a reasonable answer to this question [24]. Let p,, (i) be the probability that the series s[n] takes the value i at any time n, and ps(,sn_-(i, j) be the probability that the series s[n] takes the values i at time n and j at time n - r for all n. The mutual information is then: I(sn; sn--) = psn,sn_(i, j) log 2 P.,(W (3.6) Figure 3-6 is a plot of the mutual information of the observation x of the Lorenz system as a function of T. The first minimum of the mutual information function is our choice of r. 3.3 Modeling 3.3.1 Spatial Interpolation = Temporal Extrapolation A trajectory in the reconstructed state space informs us about states visited by the system, as well as their temporal relationships. How do we generate new trajectories (new state sequences) that have similar dynamics to the reconstructed state space trajectory? What other state sequences are implied by this trajectory? Given a new state xn not found 3.3 Modeling 33 -10 x[n] U x[n-taul -10 xtni x[n-taul 15 15 10 10 5- 0, 0, x -5, x -5, -10 -10, 15 -15 -10 -10 0 0 10 10 -10 x[n] -10 10 0 x[n-tauj 15 15, 10 10, - 5, - n0 0 C x -5, -5 -10, -10 -15, -15 -10 5 0 -10 010 010L -10 10 0 x[n-tauj -10 0 10 x[n-tau] 15 15 10 10 5 02 5 0 -5 -5 -10,-1 10 0 xln-tau] -100 -15 -15, -10L -10 xn 10 -10 0 10 xn -10 10 x[n-tau] 0 10 x[n-tau] Figure 3-5: Three-dimensional embedding of the observable x[n] by the method of delays with different values of T. From top to bottom, left to right: 34 T = 1,r = 2,F = 4,rT= 8,r = 1 4 ,r = 2 0,T= 3 2 ,T= 64. State Space Reconstruction and Modeling 6.5 6 5.5 -5 4.5 43.5 0 20 40 80 60 time laq (Tau) 100 120 Figure 3-6: Estimation of the value for r by mutual information. The first local minimum is the best estimate of r. in the original state space trajectory, what would be the "natural" or implied sequence of states xn+1, Xn+2, Xn+3 ... ? Paraphrasing in more musical terms, if our state space is a reconstruction of an actual piece of music, how do we generate new pieces of music that have similar motion characteristics to the original piece, yet different at the same time? What other musical sequences are implied by the structure of the embedding of the original piece? These questions are intimately related to the problem of prediction, and in a reconstructed state space prediction becomes a problem of interpolation [39]. There are multiple ways to interpolate the space and different modeling techniques can be used. We can consider a single global predictor valid for all xn, or a series of local predictors which are valid only locally in the vicinity of the point of interest. Local Linear Models Probably the simplest interpolation method is the method of analogues proposed by Lorenz in 1969 [25]. Let x(i)n be the ith nearest neighbor of xn. The method of analogues consists of finding the nearest neighbor x(1)n to xn, and equating xn,+1 to X(l)n+1. In other words, F(xa) = F(x(1)nl).An obvious improvement over this method is to calculate xn+1 from a number of nearest neighbors. The number of neighbors can be defined by a radius e around the point xn (in which case this number would vary depending on the density of the state space around xn), or by choosing a fixed number on neighbors N. For both cases, the combination of the neighbor points of xn can be weighted by their distances to xn. An estimation of xn+1 from the weighted average of the 3.3 Modeling 35 N nearest neighbors of x, can be expressed as N F(xn) F(xtiyn)#(||xn -- x(i)n 11) = (3.7) i=1 where # is a weighting function [25]. In our present implementation defined as Ji #i = _ .N 6=1 with Ji =1||Xn - X(i)n1iP - X()n P 1 ||xn - x()n||P H = where p is a parameter used to vary the weight ratios of the states involved. F(xn) = #1F(x(1)n) + #2 F(x(2)n) + ... # is (3.8) X(i)n + #NF(x(N)n) (.) returns values between 0 and 1, and #(1 ) + 0(2) + .. . + #(N) In other words, only the relative distances between point xn and its N nearest neighbors are considered, not their absolute distance. As a simple example of how this method performs, we calculate the flow of the entire state space by applying this formula to equally spaced points in the space. The two dimensional state space was constructed from a sinusoid sin(27r40n)10 + 13 using the method of delays, with r = 6. Figure 3-7 shows the original time series s[n] and Figure 3-8 the result of the interpolation of the reconstructed state space. In this figure we see 20- 15- 10- 5 01' 0 10 20 30 40 50 Figure 3-7: Series s[n]= sin(27r40n)10 60 70 80 90 100 + 13 with 25 samples per cycle. that all the points in state space converge to the original trajectory. In terms of the generation of new trajectories with similar characteristics to the original, this result is not useful. We want states in our reconstructed space to behave similarly to their nearest neighbors, not to be followed by the same states as their neighbors. Therefore, we modify our function 36 State Space Reconstruction and Modeling 25 10 - ~- 0 Figure 3-8: 10 5 s[n] 15 20 25 Flow estimation of state space by averaging the predic- tions of the nearest neighbors. Parameters: N = 3, p = 2, T = 6. by replacing F(x(i)n) with Sci), = F(x(i)nl) - X(i)n. Thus we calculate the velocity of point xn as the weighted average of the velocities of the N closest points to it. N F(xn) = Xn + Z x(i)n IlXn - X(i)n||) (3.9) i=1 Figure 3-9 shows the estimated flow from this method. When the observable signal s[n] is by nature discontinuous (Figure 3-10), so will the reconstructed state space trajectory. In this case, using a large N will smooth out the interpolated space, distorting the characteristic angularity of the original trajectory (Figure 3-11). The interpolation function is an averaging filter smoothing the state space dynamics. It is basically a Moving Average (MA) filter with coefficients changing at each new estimation point. Obviously, using only one nearest neighbor will preserve the angularity of the trajectory since no filtering takes place (Figure 3-12). A nice middle point between a totally curved (smooth) space and a linear one is the use of a large number of nearest neighbors N together with an equally high p exponent. This results in a generally straight flow but with rounded corners (Figure 3-13). The signals used in these examples are distant from actual music, but their simplicity makes the understanding and visualization of the concepts presented easier. In Chapter 5 we will use real music; for now just 3.3 Modeling 37 \%\ 10 e(n) is 25 20 Figure 3-9: Flow estimation for the state space by weighted average of nearest neighbor velocities. Parameters: N = 3, p = 2, r = 6. o* Figure 3-10: 1 20 3 0 N8 3 40 48 50 Signal s[n] = 5,8,11,14,11,8,5,8,11,14, 11,8,5... keep in mind that the observation signal s1n] can be any musical component such as pitch, loudness, 101, etc., or, as discussed in the previous chapter, even multidimensional signals composed of all these parameters simultaneously. We have focused on a particular implementation of local linear models as a way of generalizing the reconstructed state space dynamics. We have also discussed the effect different parameter values have on the state space interpolation. The implications these have on reconstructed spaces from actual music will be discussed in Chapter 5. 38 State Space Reconstruction and Modeling 4 1 / 99 - - . 2- 0 Figure 3-11: 1 2 4 6 8 10 12 - - 14 - 16 sini Flow estimation of state space by nearest neighbor velocities. Parameters : N = 8, p = 2, r = 1. 3.3 Modeling 39 4 * 4 ' '\'~ . V\ IN\ [I+uls k~ A' k Nll N- Ni, /7 / / P / 4 4 * /* ''N X /5 X 'Ne N 7 / [1+ulsr XX o XX X / XXX / o / be a) U) a) a) -o I..) 0 b0 0s bOC. co 0 q C -O 0 0 C C (/) C 0 U 0 0. (D( Q.n II w. N C,, (D NN II 04 0 1 4 /* / 1 ~ % /y k1 f 42 State Space Reconstruction and Modeling CHAPTER FOUR Hierarchical Signal Decomposition 4.1 Divide and Conquer Signal analysis can be characterized as the science (and art) of decomposition: how to represent complex signals as a combination of multiple simple signals. From a musicological perspective, it can be thought of as reverse engineering composition. The power and usefulness of having a signal represented as a combination of simpler entities is easy to see. Once a decomposition is achieved, we can modify each component independently and recompose novel pieces that may share similar characteristics with the original. Just as with modeling, the best decomposition depends on its purpose. For the purpose of musical analysis, we would like the decomposition to be informative of the signal at hand and for it to be perceptually meaningful. For the purpose of re-composition, we would like the decomposition to allow for flexible and novel transformations. How can we decompose our signal in a meaningful way? Different characteristics of a signal may suggest different decomposition procedures. Frequently, a signal can be separated into a deterministic and a stochastic component. More specific components might be a trend, which is the long term evolution of the signal, or a seasonal: the periodic long term oscillation[6]. Thus, multiple hierarchical overlapping structures may occur simultaneously. Particularly within the seven orders of magnitude spanned by music (see Chapter 2), these structures can vary tremendously from one time scale to the next. Because of the relevance of hierarchic structure to music and music perception, this chapter will focus on decomposition methods that reveal structure at different time scales. Before discussing these methods, we give a brief summary of some preliminary developments. 4.1.1 Fourier Transform By far the most popular representation of a signal as a combination of simple components is the Fourier transform. This transform consists of representing a signal as a combination of sinusoids of different frequency, amplitude and phase. While it is a powerful linear transformation, it's not without its limitations. Because the basis functions of this transformation are the complex exponentials (eiwt) defined for -oc < t < o, the transform results in a representation of the relative correlation of each of the sinusoids over the whole signal, with no information about how they might be distributed over time. This is fine for stationary signals, but music is almost never stationary. Thus, the Fourier transform is not an optimal representation of music. 4.1.2 Short Time Fourier Transform (STFT) In 1946 Gabor proposed an alternative representation by defining elementary time-frequency atoms. In essence, his idea was to localize the sinusoidal functions in order to preserve temporal information while still obtaining the frequency representation offered by the Fourier transform. This localization in time is accomplished by applying a windowing function to the complex exponential 9u, (t) = g(t - u)e. (4.1) Gabor's windowed Fouried transform then becomes Sf(u, )= f(t)g(t - u)e~*dt. (4.2) The energy spread of the atom gug can be represented by the Heisenberg rectangle in the time-frequency plane (Figure 4-1). gug has time width ut and frequency witdh o-, which are the corresponding standard deviations [27]. While one would like the area of the atoms to be arbitrarily small in order to attain the highest possible time-frequency resolution, the uncertainty principle puts a limit to the minimum area of the atoms at -1 44 Hierarchical Signal Decomposition gu,~t) t U Figure 4-1: A Heisenberg rectangle representing the spread of a Gabor atom in the time-frequency plane. 4.1.3 Wavelet Transform While the STFT offers a solution to the problem of decomposing nonstationary signals, it still isn't the most optimal representation of our data for the following reasons: 1. Many musical high level signals are discontinuous by nature. The choice of a continuous basis is not necessarily the optimal choice. 2. At high levels of musical structure, the concept of frequency has little or no meaning. 3. The STFT does not give us information about the structure of a signal at different time scales. The development of a method for multi-resolution analysis is necessary when dealing with perceptually relevant data. Our hearing apparatus has evolved to extract multi-scale time structures from sound. Inspired by the function of the cochlea, Vercoe implemented multi-resolution sound analysis in Csound [42] by exponentially spacing Fourier matchings. In the early 1900s, Schenker proposed what is arguably his main contribution to music analysis: the abstraction of musical structures from different Schichten, or layers at multiple scales [31]. Yet, Schenker's analysis defines the multi-scale structures in terms of tonal harmonic 4.1 Divide and Conquer 45 functions and certain specific voice leadings. Here we would like to extract the multi-scale representation more generally in terms of motion and, in addition, automatically. Thus, we import the general purpose tools traditionally used in the micro-world of sound to the macro-world of form. As a natural extension of the STFT, the wavelet transform was developed as a way to obtain a more useful multi-resolution representation. Like the STFT, the wavelet transform correlates a time-localized wavelet function 0(t) with a signal f(t) at different points in time, but in addition the correlation is measured for different scales of 0 to achieve a multi-scale representation. Thus, the wavelet transform of f(t) at a scale s and position u is given by Wf(u, s) j f(t)4* dt. (4.3) Wavelets are equivalent to hierarchical low-pass and high-pass filterbanks called quadrature mirror filters [18]. A signal is passed through both filters. Then the output of the low-pass filter is again passed though another pair of low-pass and high-pass filters and so on recursively (Figure 4-2). The outputs of the high-pass filters are called the details of the signal and the outputs of the low-pass filters are called the approximations.1 The number of filter pairs used in this recursive process defines the number of scales in which the signal is represented. While the "Gabor atoms" of a STFT are windowed complex exponentials, the development of the wavelet transform has introduced a wide variety of alternative basis functions. The first mention of wavelets is found in a thesis by Haar (1909) [20]. The Haar wavelet is a simple piecewise function 1 if 0 < t <1/2 -1 if 1/2 < t <1 0 otherwise (t)= that when scaled and translated generates an orthonormal basis: Oj,n M -t) ( 2in) V/2-i 2i (j,n)EZ2 For a detailed discussion about the relationship between quadrature mirror filters and wavelets, see [27]. 46 Hierarchical Signal Decomposition detail 1 detail 2 detail 3 low-pass high-pass detail 4 low-pass approximation 4 Figure 4-2: Recursive filtering of signal s[n] through pairs of high-pass and low-pass quadrature mirror filters. Because of the discontinuous character of the Haar wavelet, smooth functions are not well approximated. Its limitation started the investigation of alternative wavelets, and many varieties with different properties have been invented since. Probably the most popular are the Daubechies family of wavelets. The Daubechies 1 wavelet basis is the same as Haar's, and the family becomes progressively smoother as its number increases. How then do we choose a wavelet type? What is the best set of wavelet basis? Evidently, the definition of "best" depends on our goal. If our goal were compact representation, then we would want a basis that used the least number of wavelets without losing much information. The usefulness of this is obvious for compression and noise reduction, but the choice of a wavelet basis that provides the most economical representation may also help us understand the nature and intrinsic properties of a given signal. A common measure of how well a limited set of wavelets describes a given signal is by a linear approximation. Given a wavelet basis B = {#0}, a linear approximation sM of a signal s is given by the M larger scale wavelets [27]: M-1 (s, n sM (4.4) n=o 4.1 Divide and Conquer 47 The accuracy of the approximation is typically measured as the squared norm of the difference between the original signal and the approximation: +00 E[M] =s - S SM112 = (8,On)12 (4.5) n=M Figures 4-3 and 4-4 show wavelet decompositions of Bach's Prelude from Cello Suite no. 1. Both show the largest approximation and the details at all four levels. For Figure 4-3 we used Daubechies 1 wavelet, and Daubechies 8 for Figure 4-4. Measuring the approximation for sM = a4 in both cases with equation 4.5, we get e = 117.2 with Daubechies 8 Thus, Bach's prelude is better and E = 122.69 with Daubechies 1. Decomposition at level4 s a4 + d4 + d3 + d2 + dl 50 a4 4 d3 2 -5 d 3 20 'R17TT'![ * 20 100 200 300 400 500 600 Figure 4-3: Pitch sequence of Bach's Prelude from Cello Suite no. 1 and its wavelet decomposition using Daubechies 1 wavelets. s is the original pitch sequence, a4 the approximation at level 4 and dn are the details at level n. The sequence is sampled every sixteenth note. approximated with the Daubechies 8 wavelet. Notice though that the decompositions of the signal using Daubechies8 are always smooth, while those with Daubechies 1 are not. If the discontinuous character of the prelude is important to preserve, then Daubechies 1 is a better choice of wavelet. Because our goal is to generate new pieces and not to compress them, this qualitative criterion is certainly more useful to us. In the next chapter we will exploit the continuous quality of Daubechies 8 to obtain musical variations with similarly smooth characteristics. Hierarchical Signal Decomposition DecompostionatIlevel4: a n4+d4 +d3 +d2 +dl 60 S 50 40 so -5 d 2 d 3 0 100 200 300 400 500 600 Figure 4-4: Pitch sequence of Bach's Prelude from Cello Suite no. 1 and its wavelet decomposition using Daubechies 8 wavelets. s is the original pitch sequence, a4 the approximation at level 4 and da are the details at level n. The sequence is sampled every sixteenth note. 4.1.4 Data Specific Basis Functions As we saw in Chapter 3, state space reconstruction by the method of delays allows us to uniquely represent each step in a given sequence as a function of multiple variables: sa = h(xn) = h(x1, x2, . .. , x1n). h(xn) must then be defined as a combination of the components of xa, for example sa = a1(xn) + a2(xnl) ± . . . + an(x-) or a1(xfl)a2(xnl) . . . an(xn) or may have one of many other forms. Can the structure of the manifold that results from the embedding tell us something about the structure of h(xn) and its variables, and as a consequence on the observation signal so? It turns out that similarly to the wavelet transform, a lag space representation automatically recovers structure at different time scales. This is one of the most important observations regarding state space reconstruction by the method of delays. As an example, consider the following signal: sin(2ir10t) + sin(27r47t)/4 (Figure 4-5). This signal can be seen as a fast oscillation "riding" on a slow oscillation. In a three-dimensional lag space (Figure 4-6), this simple linear combination of two sinusoids becomes a torus, and the two levels of motion (a global cycle and a local cycle) are clearly distinguishable. In Chapter 3 we assumed all along that the systems we were dealing with were autonomous, yet a state space trajectory such as this one can be interpreted as an autonomous dynamical system being driven by another system. How do we separate these two systems? Because the state space is reconstructed by assigning 4.1 Divide and Conquer sin(27r10t) and sin(27r47t)/4. Figure 4-5: Top: sin(27r47t)/4. Bottom: sin(27rl0t) + 0.5, C\j 0 Figure 4-6: Three dimensional state space reconstruction of sin(27rl0t)+ sin(27r47t)/4, with r = 220. Hierarchical Signal Decomposition delayed versions of a single signal to each coordinate, they all have the same information delayed by some r time. In other words, projecting the embedding onto any of the three dimensions will yield the same signal. A careful observation of the torus in three-dimensions reveals that by rotating the structure in such a way that the maximal variances are aligned with the axes of the space, the previously redundant dimensions become decorrelated, separating the two sinusoids almost completely (Figure 47). Figure 4-9 plots the two signals resulting from the projection on two of the three orthogonal axes after rotating the torus. The method of 1, s(t-tau) 11.5 0.5 -1-00.5 Figure 4-7: PCA on the torus resulting from the state space reconstruction of sin(27r10t) + sin(27r47t)/4, with r = 220. Figure 4-8: Two component dimensions of the state space reconstruction of sin(27r10t) + sin(27r47t)/4 (with T = 220) after PCA transformation. 4.1 Divide and Conquer 51 delays therefore provides a means for decomposing our signal into a "natural" set of hierarchical basis functions. In other words, the functions are derived from the data rather than selected a priori. The transformation that rotates the state space trajectory so that dimensions become decorrelated is called Principal Component Analysis. In Chapter 3 we saw that the interpolation of the state space manifold "sketched" by the reconstructed trajectory is a way of generalizing the dynamics of a signal. These recovered basis functions are the other half of the generalization of the structure. These building blocks are independent entities that can be modified and recombined to generate new state space geometries while preserving its identity (its "torusness"). For example, multiplying the amplitude of the first sinusoid by 5 results in the state space trajectory shown in Figure 4-9. This is a simple example by which we -5 5 0 s(t-tau) Figure 4-9: sin(27rl0t)5 Three 5 -5 dimensional state T = 220. space reconstruction of + sin(27r47t)/4, with illustrate concepts and tools for music structure generalization. In the next chapter we will use them to generate new pieces of music. For completeness, we compare PCA with the wavelet transform using again Bach's Prelude. We arbitrarily choose eight dimensions for the embedding of the Prelude, and r = 1. Figure 4-10 shows the bases resulting from applying PCA to the lag space and projecting the trajectory on each of the dimensions. 52 Hierarchical Signal Decomposition 7011 500 600 700 400 500 600 700 300 400 500 600 700 200 300 400 500 600 700 - 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700 100 200 300 400 100 200 300 0 100 200 020 100 00 0 0 0 50 I 0 0 -0 0 -2 501 -20 200 -20 -20 Figure 4-10: Pitch sequence of Bach's Prelude from Cello Suite no.1 and each of the eight principal component extracted from an eight dimensional lag space. s is the original pitch sequence, P~n the nth principal component. The sequence is sampled every sixteenth note. 4.1 Divide and Conquer 53 4.1.5 Principal Component Analysis (PCA) The main purpose of Principal Component Analysis is that of decorrelating a set of correlated variables. As already suggested, the variables we are interested in decorrelating are the dimensions of a lag space. We want each of the dimensions to become as independent as possible so that each provides unique information about the state space trajectory. We have seen that a careful rotation of the lag space can separate the different scale dynamics of a signal s[n]. But how do we do this mathematically? Essentially what we want is that the m dimensions that make up the lag space trajectory be as independent and as informative as possible. PCA is the process of obtaining linear independence between the m vectors zi,z 2 , ... , Zm, i.e. if a 1 zi + a 2z 2 + . . + amzm = 0 then ai = 0 for all i. Statistically, m variables are linearly independent if their covariances equal zero. Given that the m length observation vectors x are vertical, their covariance matrix is defined as: C, = E[(x - E[x])(x - E[x])T] We want a transformation y = Mx such that Cy becomes the identity matrix. In relation to x, the covariance matrix of y is Cy = E[(y - E[y])(y - E[y])T] = = E[(M(x - E[x]))(M(x - E[x]))T] E[(M(x - E[x]))((x - E[x]) T M T ) = ME[(x - E[x])(x - E[x])T]MT = MCxMT Choosing M to be the eigenvectors of Cx is the transformation that makes Cy the identity matrix, rendering the dimensions of the lag space linearly independent. Yet, it is important to be aware of the assumptions and limitations of this trasformation[36]. 1. Linearity. The state space manifold resulting from a time lag embedding may (and many times is) curved and twisted. In order to avoid redundancy, a nonlinear transformation would be required prior to applying PCA . In the sum of sinusoids example, the comHierarchical Signal Decomposition ponents were by definition linearly independent, which allowed us to separate them well. Unfortunately, this is rarely the case. 2. The principal components are orthogonal. Related to the problem of linearity, it is possible too that the points in lag space result in distributions whose axis are not perpendicular to each other. Thus, while PCA may decorrelate some axes, it may not decorrelate others. 3. Large variances reflect important dynamics. For the purpose of dimensionality reduction, PCA assumes that the important information is in the parameters (or dimensions) with the greatest variance, while those with the least variance are discarded as insignificant or noise. In the Bach example, the large scale dynamics are the ones with the greatest variance, but this is not necessarily always the case (although it usually is). While PCA has its own limitations, it is still a useful transformation that can allow us to extract and modify dynamics at different time scales. Other more robust transformations attempt to achieve statistical independence between all the variables (p(xi, xz) = p(xi)p(xj)), rather than just linear independence. The approaches that fall into this category are called ICA: Independent Component Analysis. We will not deal with this family of transformations here. 4.2 Summary The possibility of representing a signal as a combination of simpler basis functions is a second level of generalization. This decomposition allows us to transform the state space trajectory with much more flexibility and, by implication, the original observable sequence as well. Applying PCA on the lag space results in a set of useful basis functions that reveal hierarchical structures in a similar way the wavelet transform does. The simplicity of PCA over that of wavelets is another attractive feature. On the other hand, having an analytical description of the wavelet basis can give us more control over the type of decompositions and transformations we may perform. In the following chapter we will apply both transformations to concrete musical examples. 4.2 Summary 55 56 Hierarchical Signal Decomposition CHAPTER FIVE Musical Applications We have presented some theoretical background, tools and methods useful for the analysis, modeling and resynthesis of music, and are ready to give them some concrete applications. We have also mentioned how each of the methods provides some aspect of the generalization necessary for an inductive model. We now summarize the three types of generalization discussed: 1. State space interpolation (Chapter 3): The recovered state space trajectory by the method of delays serves as a kind of "wireframe" that suggests the existence a full hyper-surface or manifold that can be estimated by interpolation. New trajectories that fall on the manifold and follow its flow constitute the generalization. By following the flow, new pieces with similar dynamics and states can be generated. 2. Abstraction: states vs. dynamics (Chapter 3): The recovered state space trajectory provides information about the states visited by the trajectory and the dynamic relationships between these states. Each of these two pieces of information can be separated and varied independently for the generation of new music. 3. Decomposition (wavelets, PCA, ICA) (Chapter 4): Either through wavelets or through PCA or ICA, it is possible to decompose a musical sequence into hierarchical, simpler, and meaningful components. These decompositions reveal embedded dynamics and provides an additional level of understanding of musical structure and flexibility for subsequent transformations. All three aspects can be used as tools in music analysis and synthesis, and may be combined in many different ways. Here, we can only give a few ex- amples of their use and suggest some other possibilities. Sound files from the examples discussed in this chapter as well as additional relevant material can be found at http: / /www.media.mit.edu/~vadan/msthesis. 5.1 Music Structure Modeling and Resynthesis Combined Components vs. Independent Components in Multidimensional Signals When considering the multiple components of a piece of music, we can opt to model their dynamics independently or collectively. Keeping things separate gives us more control over the variety of transformations we can apply to music, but modeling all components collectively allows us to generate new trajectories that preserve their aggregate dynamics in the original training piece. The state space reconstruction of these two approaches is depicted in Figure 5-1. sn 1 AA In]:~ /\SSR ~ ~ i[n]: 5V n Aa zz i:n] sinn: SSR sin SSR n Figure 5-1: Top: The aggregate dynamics of all the musical components such as pitch, loudness and 10I are modeled jointly by constructing a single state space. Bottom: Multiple state spaces are reconstructed and modeled from each component independently. Musical Applications For the purpose of state space reconstruction, a multidimensional signal [n] = Si,0, si,1, ... , s1,n s2,0, s2,1, - - -, s2,nl s1,11 - . _ sl,0, , (5.1) _ 81 s,n can be treated in essentially the same way as a scalar signal. Each point in state space is defined as a set of delays of each of the signal's components: x[n] = (s1[n],s1[n-Tr],...,s1[n-(m-1)r],s2[n],s2[nr-r],..., s2[n - (M - 1)r], ... , si[n], si[n - r), ... , sl[n - (m - 1)-..2) The reconstructed state space has dimension 1m. As can be seen in the previous equation, the resulting state space is an aggregate of the individual state spaces of each component in the signal s[n]. Thus, a different T can be chosen for each component si[n]. Since the value ranges of the components si[n] will usually differ widely, it is important to whiten the reconstructed state space to avoid having dimensions that are too flat relative to others. Having different variances for the different dimensions will distort the map or flow estimation since our weighting and averaging functions rely on euclidian distances. After estimating a new trajectory, we must de-whiten the data. 5.1.1 Interpolation As we reviewed in Chapter 3, the type of interpolation we perform on the reconstructed state space can greatly affect the qualitative characteristics of the estimated flow. Thus, a careful choice of the state space model parameters can be crucial to obtain good musical output. For the case of the local linear models discussed in Chapter 3, there are two parameters to consider: the number of nearest neighbors and the p exponent in Equation 3.8 (i.e. the weight of each neighbor as a function of their distance). In addition to whatever modeling technique we use, there is the choice of embedding dimension and the initial state x 0 for the estimation of a new trajectory. As a concrete example of how the different parameters affect the interpolation and thus the outcome of new musical pieces we use Ligeti's Piano Etude no.4 Book 1 (Figure B-1). We combine pitch, loudness, I01 and duration in a single lag space and estimate a new trajectory 5.1 Music Structure Modeling and Resynthesis 59 for multiple combinations of the parameters already discussed. For each new trajectory estimated we systematically vary one parameter value at a time. The embedding dimensions explored are 2, 4, 6, 8 per parameter. So, given that we are embedding four parameters together, the lag spaces actually have 8, 16, 24, and 32 dimensions. The number of nearest neighbors used are 2, 4, 6, and 8, and values of the exponent p are 2, 16, 64, and 128, As the initial condition we always use the mean of the space. Figure B.3 in Appendix B shows the new generated sequences in pianoroll representation. All sequences are 500 notes long. Number of Neighbors and the p Exponent From the pianoroll plots of the generated pieces we see that for very high values of p, particularly p = 128, the resulting sequences are almost identical to some fragment of the original Etude. This is because for such high values of p the nearest neighbor to the point being estimated will have much greater weight in defining its behavior than the more distant points. Thus, no matter how many neighbor points are considered in the estimation, having p = 128 is practically the same as considering only one nearest neighbor. The resulting trajectories are then transposed copies of the original trajectory. The other problem with using only one neighbor or too high a value for p is that, particularly for embeddings with insufficient dimensions, it is very likely that the estimated trajectory will fall in an infinite loop. In Figure B.3 we see several instances of this. The second half of panels two and three (from left to right, top to bottom), as well as panel eight are examples of this. Ideally we want to use several neighbors for the interpolation to be more accurate. But, as discussed in Chapter 3, the use of more than one nearest neighbor results in smoothed out maps, distorting the originally angular quality of discontinuous dynamics. Thus, a careful balance between the number of nearest neighbors and the values of the exponent p must be found in order to produce novel yet sharp interpolations. Naturally, if the original series where continuous, then the choice of number of neighbors would not be a problem. Dimensionality of the Embedding In Chapter 3, we discussed the requirement of finding a high enough dimension for the embedding to be able to capture all the degrees of freedom of a continuous autonomous system, and for a bijective relation to exist between the observation sequence s and the state space trajectory x. What are the implications of the embedding by the method of delays on discontinuous series? Mainly Musical Applications that trajectories may cross without violating the bijective relation between the state space trajectory and the observation sequence because trajectory crossing of discontinuous series does not imply an overlap of points. Thus, the embedding dimension for discontinuous sequences can be smaller and still be good for the generation of new sequences. From the plots in Figure B.3 we can see that, in general, large scale dynamics from low dimensional embeddings (2 or 4) are rather chaotic compared to those of high dimensional ones. 1 To understand why this is the case, consider the nature of the reconstruction by the method of delays. Each state in lag space is defined as a set of ordered values taken from the observation sequence. If the embedding dimension m = 2, we define each state as a set of two values in the observation. Setting T = 1, for example, defines each state as two successive values of the observation sequence. Choosing a four dimensional embedding would give us a map of all the four value combinations found in the sequence. Evidently, one state might be visited more than once if the dimensionality is not high enough. It should be clear that the higher the dimensionality of the space, the sparser it will be. If the dimension is too high, very few points will occupy the space, and different trajectory segments will be at a big distance from each other, making their interaction for the generation of new trajectories more difficult. If the dimensionality is small, the space will be very dense and points may fall on top of each other. Here, new trajectories will be estimated from several trajectory segments that are close to each other in the embedding, but not necessarily close temporally in the original sequence. Thus, these points may have very different velocities. Initial State This is the most difficult parameter to estimate in terms of the predictability of the output. Evidently, choosing any of the points that fall in the embedded trajectory as initial state will result in the original music sequence from that point on. The actual estimation of new states begins when the end of the trajectory is reached. In time series prediction this is the natural point to begin the estimation of a new trajectory. But for the purpose of generating novel trajectories without forecasting motivations, any point in the state space that does not fall on the original trajectory will do. A reasonable thing to do might be to select the initial state at random, but for the purpose of evaluating 'This is similar to what happens with Markov models, where small order models result in sequences that are locally similar to the original but globally unrelated, while higher order models yield sequences that preserve the structures of the original piece at higher scales. 5.1 Music Structure Modeling and Resynthesis 61 parameter values and dimension sizes, it is best to choose a constant initial state. There is only so much we can do by modeling all the components together. The lack of control on the independent parameters and the limited number of variables in the local linear models used make it difficult to produce novel pieces that are clearly different from the training piece. Because the training piece in its entirety is modified by only a few parameters, the new generated pieces are coarse variations on the original. In order to produce more refined transformations, we must make use of the decomposition methods discussed in the previous chapter. 5.1.2 Abstracting Dynamics from States Rotations A geometric interpretation of the reconstructed state space suggests a variety of transformations of the musical data. Simple transformations like rotations have been used as generators of musical variation [43, 16, 1]. In fact, the classical transformations of inversion, retrograde and retrograde inversion found in western music since the 16th century can be understood as four simple space-time transformations: 180' rotations in two-dimensional lag space and temporal reversing. In Chapter 4 we reviewed how PCA on a reconstructed state space is nothing more than a special purpose rotation for decorrelating the component dimensions. But more generally, rotations can be interpreted as a way of changing the states visited by a dynamical system while keeping the dynamics unchanged. Thus, a state space trajectory defines a set of possible state sequences that share the same dynamics. If the purpose of the embedding is the application of geometric transformations exclusively, then the requirements for a proper embedding for state space reconstruction (see Chapter 3) can be ignored, and all dimensions m > 2 are useful. In this interpretation the dimension of the embedding defines the possible degrees of transformation. The number of degrees of freedom of a rotation is defined by the dimensionality of the space. For an m-dimensional space, there are C(m) planes of rotation [2]. Thus, higher dimensional spaces have the potential for a greater variety of transformations. 62 Musical Applications A rotation is a linear combination of the m elements defining each point in lag space. For example, given a series s[n], we embed it by the method of delays with a delay of T and dimension m = 3. We get the embedded trajectory x[n], so that each point in the m-dimensional space is xn = (sn, s-7-, sn-2). If we apply a rotation of angle 6 to the sn-2r axis of the embedding space, we get: xc [ Sn = sn-,r sn-2r cos 0 sinG 0 ] -sin 0 cos 0 0 0 (cos Osn - sin = 0 ] (5.3) 1 s--r, sin 9sn + cos Osn-r, Sn-2r) These rotations can be applied to the lag space of any musical component independently or to that of the combined components. In this case the components will be combined, so that pitch, for example, will be a combination of pitch, rhythm, etc. As an example of rotating the Courante. Id I I I I 33 do.. aI.. I -3 J Figure 5-2: First six measures of Bach's Courante from Cello Suite no.4. lag space of a single component we take the first six measures of Bach's Courante from Cello suite no.4. We consider only one component: the Inter Onset Interval (which in this case is the same as the duration of each note) .2 We embed the sequence in three-dimensional lag space (Figure 5-3). We explore all 64 combinations of 90' rotations in this three-dimensional space and plot the projections of each rotation. As can be seen in Figure 5-4, many resulting patterns are identical. Only six patterns are actually different (Figure 5-5). Thus, with 900 rotations we obtain only five new sequences out of a total of 64 rotations. We are curious to know what other patterns might be obtained from other rotations, so we compare the projections of all 512 450 rotations. Out 2 While the trill "embellishment" in measure 4 is not written out explicitly, it is an important element of the rhythmic sequence. Thus, in the electronic score we write it out as 32nd notes. 5.1 Music Structure Modeling and Resynthesis 63 1, 0.9 0.8 0.7 0.6 0.5- . 0.4. 1 0.3n 0.2 3 0.8b 0.6 0.4 A 0.2 s[n+Taul 0.2 .. . . . 0.6 0.4 0.8 Figure 5-3: Three-dimensional embedding of 101 of the first six measures of Bach's Courante from Cello Suite no.4. 00 0 0 ~ ~ 0 000 ~ 0:000 1718I I 00 00 0 0: 0 0 0: 0 :0 10 Time inquarternote units Figure 5-4: Projections of all 64 900 rotations of the three-dimensional embedding of the 101 from Bach's Courante. 64 Musical Applications 04 44 0 . 4-444444 44 444 .121:44 .260 . .. . 16444 4 . . ~44444.44.4...44444444444: 44444-4-444.......... 1 E 00 of 800 0 4 ... Patr the4embedding. 10 4 number~ 14 12 4 is 16 1 rhyhmi orgina4 Bachs . ... .4.:.4.:.4.4.4.4-.-.4.4.4.4.4. 44404444:446444. 00 2 - . 18 sequence. . .. . .~ .. .. .4~44. 4 4 4~~~~~~~~ .4444. . .4-4- 444. of 512, 26 (including the original) are distinct. Figure 5-6 is a plot of all 26 non-repeating patterns. Figure 5-7 shows patterns no.1 and no.25 in musical notation. They are approximations quantized to the 64th note, particularly in cases where no clear rational proportion was found. pattern in Bach's Courante is From our interpretation the rhythmic Time in quarter note units 18 16 14 12 10 8 6 2 0 patterns Nh i --4on-repeating Figure Resnthsiresulting from ll51'4 rota-n 0and--4 Modeling Music5-6: Srcu 5.1 :4-4-0 4-4-444 0444444.44.4-0--0-** 444... 4444:4.010 40.-..*. .4.4:.944 14 is Bc' rgnlryhi tosof thethe-iesoaIO embedding. Pattern number Bah'e oignaehyhmcceqene 000004.0 4. 0' 0 ~ 4 4 44... 444.440,0-0.44444 4444 65000...... 44 4.4. .4.4*.O .4 1 oof .0OOO .4. .4 .4 4 4. .4 -4 04 44 4.44.4. 44 4- Ow..44. 44444440.o 4:-0-4 .0 4 4 4 4 . 4. . . 4 4 4 4 J 4 4 opathersgthat) sare aiticmmo a fam(ilydn ow 40 04 40..:44444 4 440 4. 4.444:40. 444:44444 . 4 4 . 0~~~~4 . stuct.Hre we lto new duatins not have generaten.444444444.. rathe thannrpatiiscreterphb. 4. 44-4.44444 - 04.. Wue .4.40 - :04- 4 0 :4:444 .4. 4.4 44.40 444 4.00* .044. 4-- - .4O '..4.0 . 44.. 444444 0.4 -4 .0!0 4. ;... .4. 0 . 4.-0 .10 40 440*0 *. . 0.4 4 4 4 ..at .... thathe squnies s.Tequaee aspwellmasionw petin theurigal simil arstrcurey to theee orial Beas rthesenew sequetince ware om-d 18 18 14 12 10 8 6 4 2 0 note units Timein quarter tis geative aprachi sequenen u trpeationso the vleofaorginalc bin 4 -0. ...4 450 rota14 is number Pattern embedding. 101 three-dimensional tions of the Bach's original rhythmic sequence. Figure 5-6: Non-repeating patterns resulting from all 512 one of a family of patterns that share a common structure. Here we see one of the advantages of considering musical space as a continuum rather than a discrete alphabet. We have generated new durations not present in the original sequence, as well as new sequences that have a similar structure to the original. Because these new sequences are combinations of the values of an original sequence, this generative approach 5.1 Music Structure Modeling and Resynthesis 65 Pattern no.1 Pattern no.25 rnm _Lj ___~2221__ ____ Figure 5-7: Musical notation of patterns no.1 and no.25 from the 26 non-repeating patterns resulting from 45' rotations of the embedded IOIs. can be called combinatorial as opposed to systems based on discrete Markov models such as the one we discussed in Chapter 1, which are typically permutational. For an additional example in pitch space see http: //www.media.mit.edu/~vadan/msthesis. 5.1.3 Decompositions The top portion of Figure 5-8 shows a fragment of Ligeti's Etude no.4 book 1 in pianoroll representation. 3 The Etude is composed of two layers. One consists of an ostinato characterized by an ascending pattern with a regular rhythm. The other is a homorhythmic polyphonic texture. Throughout the piece, the ostinato is transposed up and down in jumps of one or more octaves. We want to transform Ligeti's Etude so that its global pitch dynamics become continuous. For this we wavelet transform the pitch sequence of Ligeti's Etude using Daubechies 8 wavelet. The decomposition is done at 7 levels. After decomposing the signal we perform a 4-dimensional state space reconstruction of the largest scale approximation and rotate it by 7r radians on each of the six rotation planes. We then project the rotated trajectory back to a single dimension and inverse wavelet transform it to recover the complete pitch sequence of the Etude. Figure 5-9 depicts this process. Because the Daubechies 8 wavelet is smooth, the resulting transformation is smooth as well. The bottom of Figure 5-8 depicts the result of the transformation. 3 66 This pianoroll plot was generated using the MATLAB MIDI toolbox.[13] Musical Applications CfC G7# -/: D7# --A6# F6 6 I/. " T711 C8 D7 -. E4 E6.. A1#A " * - G5 - ..e'.......I - .. 7.. . - -6 -. ......... - 6b" ...... Time i beat A11 -E1 ".5- - Time in beats CD 07 A6 B0F-- 50 100 150 200 250 300 350 400 450 500 550 Figure 5-8: Top: Fragment from Ligeti's Piano Etude no.4, Book 1. Bottom: Etude no.4 after a ir radians rotation of the state space reconstruction of the 7th level approximation using Daubechies 8 wavelet. U) C 0 C5 CL a. i T a2 DW a2 ------- -- ------ 2 .... 9 Figure 5-9: Diagram of the transformation process used to vary and render continuous the large scale pitch contour of Ligeti's Etude. First, the pitch sequence is discrete wavelet transformed. Then, the largest scale approximation is embedded in three-dimensional lag space and rotated to obtain a variation of the original trajectory. Finally, after replacing the original approximation with the variation, the decomposition is inverse discrete wavelet transformed. 5.2 Outside-Time Structures In this work we have focused our attention on the dynamic properties of music, and have pointed out the advantage of considering the musical space as a continuum rather than a discrete alphabet. But in addition to the temporal aspects of music, a lot of the quality of a piece comes from its "outside time" structures [43] as well. These consist of the sieves through which the different musical components are discretized. For example, a piece sieved through a major scale feels very different from the same piece sieved though a Phrygian scale. While the hard distinction between in-time and outside-time may sometimes be inaccurate (modes are a case where the sieve is dynamically defined), the distinction provides a simple way of modeling music structure. 5.2.1 Sieve Estimation and Fitting Both for practical as well as perceptual reasons, it is many times useful to consider scales as cyclic structures. In Chapter 3 we mentioned how the helical pitch space is a better representation of pitch perception than a straight line. By collapsing the helix into a circle we can reduce the infinite number of steps in the helical staircase to a compact set of equivalent pitches classes. Indeed, in musical pitch theories such as Forte's [17] or Estrada's [14], there is the notion of octave equivalence. This notion allows the reduction of the pitch space into a small set of pitch classes. This is simply a modulo operation on the pitch space, and mod 12 is the modulo typically used for the twelve tone equal tempered scale. Similarly, we may define a time sieve in the rhythmic dimension (IOIs) via a modulo. If there exists a clear beat or higher rhythmic structure like a meter, then we can reduce all the 101 to a time sieve modulo the meter. Thus, the same helical structure used for pitch space can be used to represent similarity (or equivalence) between events in relation to their placement in time. Figure 5-10 depicts an example of this representation for a triple-time meter. Here we take a simple statistical approach to sieve modeling by comput4 ing a histogram of the values found in each of the musical components. Essentially, the histogram of each component is the model of the sieve. To sieve a newly generated trajectory we take two measurements into consideration: 4 Contrary to traditional tonal music theory, where a scale in pitch space is defined by the underlying functional harmony, all pitches found in a piece are considered part of the scale. 5.2 Outside-Time Structures 69 I 2 3 4 Figure 5-10: A helical time line representing equivalent points in a measure in triple-time. 1. The relative frequencies rf(Ek) and rf(Ek+1) of each of the two s, scale values Ek and Ek+1 surrounding point in the new series. i.e. Ek < sn < Ek+1- 2. The relative distance between the point to be fitted, s, and each of its surrounding scale values. i.e. IIEk | and ||Ek+1 - Sn i- The complete equation used to replace the estimated value by that of the sieve is: Ek rf (Ek)P ||Ek - rf (Ek+l)p n|| ||Ek+1 - Qn1 Ek+1 We use the exponent p to adjust the weights of the relative frequencies versus the distances. If the distance between the generated point sn and its surrounding sieve points is considered to be more important than their relative frequencies, then we use a low value for p and vise versa. 5.3 Musical Chimaeras: Combining Multiple Spaces An interesting problem in automatic music generation is that of combining two or more pieces to generate a new one. This is not a trivial task, and in the context of the present work the most natural thing to do is to combine the reconstructed state spaces. There are three basic ways of doing this: 70 Musical Applications Method 1: Generate a new trajectory from the average of the flows estimated for each of the embeddings: zn+ = aF(xn) + bF(yn). (5.4) In this approach we estimate the behavior of a point in each of the separately reconstructed state spaces. The estimated velocities are then averaged to obtain the behavior of the new point resulting from the combination. The nature of this method suggests considering the combination of two pieces as a mixture where we can control the percentages of each piece in the mix. For example, in a two piece mixture, we could combine 90% of piece 1 and 10% of piece 2 simply by weight the estimated velocity vectors for each piece: zn+ 1 = 0.9F(xn)+0.1F(y,). As in method 1, all the embeddings must have the same number of dimensions. Method 2: Estimate a new trajectory from the superposition of the embedded trajectories: (5.5) zn+1 = F(xn, y). All spaces must have the same dimensionality. The dimensionality necessary to capture the degrees of freedom of one piece will usually be smaller than that necessary for all the pieces combined. Thus, we estimate the embedding dimension necessary for all the combined pieces by taking them as a single sequence. Once the embedding is made, we iteratively compute zn+1For methods 1 and 2 we can ask the following questions: how should the spaces be overlapped? Should they be overlapped without alteration or should they be transformed in some way; maybe translated so that they share the same centroid or rotated using PCA so that their principal components are aligned? Again, the goal of merging two or more pieces is an important determinant of the approach to be taken. If our goal is to keep each of the training pieces as clearly perceivable as possible, then the geometric transformations just mentioned must be avoided or kept to a minimum. Method 3: Construct a state space from both pieces directly: x = 2 (Sln-(m-1)r, s n-(m-1)r, Sln-(m-2)r, s2n-(m-2)r, - - sln-r, s2n-r, s1n, s2,) (5.6) In this approach instead of superposing two m-dimensional spaces, we create a new l+n-dimensional space by combining an I and a n-dimensional 5.3 Musical Chimaeras: Combining Multiple Spaces 71 spaces. Thus, in this method the number of dimensions for the reconstructed state space of each piece may be different. While in methods 1 and 2 each lag space dimension is a combination of both training pieces, in method 3 half of the dimensions are defined by one piece and half by the other. Thus, each point in the lag space is defined by both pieces. The difference is important because our new observation will be a projection of the lag space onto one dimension per component. What dimension should this be? Should it belong to training piece 1 or piece 2? The piece to which the projection dimension belongs to will dominate the new output. This can be seen as the carrier piece, while the other is the modulator. As discussed in Chapter 4, this approach can be interpreted as two systems driving each other. To show how each of these methods performs, we combine two pieces with each of the three methods. We are particularly interested now in trying to preserve clearly perceivable characteristics from the two training pieces. Thus, we do not alter the embeddings of any of the pieces in any way prior to their combination. The pieces we use are Ligeti's Piano Etudes 4 and 6 from Book 1. Again, in order to see how these three methods behave under different parameter configurations, we systematically explore several values for the three parameters: embedding dimension, the number of nearest neighbors, and the the weight of the neighbors relative to their distance (the p exponent). We generate 500 points (notes) for each test, again using the mean of the reconstructed trajectory as the initial point. Method 1 This method seems to be the one less likely to work. Looking at Figure B.4 and listening to the sequences generated with this method we see that there is little resemblance between these and the original Etudes. The pieces generated are noisy. In other words, there is little structure and the music sounds quite random. This is especially true for low dimensional embeddings. As the embedding dimension increases, though, the global dynamics become more varied and interesting. Yet, the local dynamics are usually very irregular and the source pieces imperceptible. Imagine embedding many pieces of music, each in their own state space. What is the velocity of a single point in each of the spaces? We will most likely discover these velocities to be very different. Averaging these estimated velocities will then result in unpredictable dynamics and possibly neutralization due to opposing velocities. 72 Musical Applications Method 2 The choice of the number of nearest neighbors in the lag space interpolation is important in defining the mixture of the pieces. If only one neighbor is used, the dynamics of each point will be defined by only one of the pieces being combined. The greater the number of nearest neighbors, the greater the chances that the behavior of a point will be defined by points from the two training pieces. The p exponent equally affects the mixture. As we have seen, too high a value of p will have practically the same result as using only one nearest neighbor since the weights of distant points will be very small compared to those of nearby points. Figure 5-11 is a pianoroll plot of a 500 notes sequence generated from the mixture. The parameters used to generate this result were p = 200, number of neighbors=11, and embedding dimension=3. What is A7 E7 B6 - * " B51C6# D5# - E2 - A# - A0 50 10 lime in beats 15 20 25 Figure 5-11: Mixture of Ligeti's Etudes 4 and 6 Book 1 using method 2, with parameters p =200, number of neighbors=11, and embedding dimension=3. characteristic about this method is the fact that it tends to behave like a collage, deserving the chimaera alias (this can clearly be seen in Figure 511). This is because in the superposition of the two state spaces, the trajectories from both pieces will not always be close to each other. At some point in the combined space, trajectories will diverge, while at others they will cross or pass close to each other. Thus, at some moments the new estimated trajectory will be dominated by one piece; at places where the trajectories of the training pieces meet there will be a mixture and/or a shift of dominance in the estimation from one piece to the other, making the resulting sequence a kind of splice and mix, and reordering 5.3 Musical Chimaeras: Combining Multiple Spaces 73 of fragments of the training pieces. Naturally, the degree to which pieces will be combined depends on their degree of similarity. Similar states between pieces will be close to each other, while states that are not similar will be apart. It is important, then, to find a representation that will make the spaces overlap as much as possible, but, again, without altering them since we want to be able to hear the original training pieces in the new piece. Therefore, we modify the components of the sequence (pitch, loudness, 101, duration) of the training pieces so that they match as much as possible without altering perceptual features. For example, a histogram of the 101 of Etude 4 shows that the most frequent 101 values is 0.5 (a quarter note), while that of Etude 6 is 0.25 (an eight note) (Figure 5-12). Because the difference can be interpreted simply as that Etude 4 Etude6 2000 1600 1400 1500 1200 1000 1000 800 600 500 400 200 0 0.5 1 1.5 2 2.5 101in QuarterNotes 3 3.5 4 Figure 5-12: Histogram of 10I 0 0.5 1 1.5 101in QuarterNotes 2 2.5 of Ligeti's Etudes 4 and 6. of tempo, (which generally is not a defining characteristic of the piece unless the difference between tempi is too great) we scale the 101 of one of the Etudes to equal that of the other. A similar transformation can be applied to the pitch component. If the register is not something we may consider very important, we can use the sequence of pitch intervals (the Inter Pitch Interval) simply by taking the difference of the pitch sequence. We don't do this in the present example though, keeping the original pitch sequences unchanged. The dimensionality of the superimposed spaces is also an important factor defining the degree to which the new estimated trajectory will be a combination of the two training trajectories. Since each point is defined by a sequence of notes in the training pieces, the higher the dimensionality of the state spaces, the less likely it will be that points from different pieces will be close to each other. Thus, the best results are obtained when the dimensionality of the embedding is low. On the other hand, 74 Musical Applications as we saw before, choosing too low a dimensionality will make the high scale dynamics of the resulting trajectory more chaotic. Method 3 The third method is probably the most intriguing. As in method 1, the generated sequences resulting from the combination of both pieces into a single state space are generally noisy. A mind experiment might help visualize what's happening. Recall that the embedding of a sum of sinusoids by the method of delays results in a toroidal structure (Section 4.1.4). In lag space these two sinusoids can be interpreted as two signals driving each other, in a similar way as planets pull and affect each other's motion. Including a third dimension with its own dynamics will add an additional degree of freedom and complexity to the system. By the Central Limit Theorem, the resulting dynamics will tend to gaussian noise as the number of dimension with independent dynamics tends to infinity. Thus, if the dynamics of the training pieces are complex, the dynamics resulting from their combination will be even more complex, and the result will be noise. This method is effective only when the spaces to be combined are relatively simple. Figure B.7 in Appendix B shows the result of combining Ligeti's Etudes with this method for multiple parameter values. Clelarly, the resulting trajectories from this mixture are overly noisy. But rather than embedding all the parameters of both pieces in a single space, we can embed each of the four components (pitch, loudness, 101 and duration) independently (see bottom of Figure 5-1) and combine the spaces corresponding to the same component (figure 513). Figure B.7 shows several 500 note sequences generated with this configuration for the same parameter combination as with methods 1 and 2. The choice on the exponent p greatly affects the output of the interpolation using method 3. From the multiple panels in the figures it can be seen that no matter what the choice of dimension or number of nearest neighbors is, when p = 2 the resulting sequence is noisier than with higher exponents. In method 3, p is clearly the most important factor determining the degree to which the points interact, and thus, the degree to which the pieces mix. It could be labeled the "mixing level knob" of the method. 5.3 Musical Chimaeras: Combining Multiple Spaces 75 s[n: Fu s[n]: n-3 E s [n]: SS~: ~R e SSR [n]: sI[n]: What state spaces Are Made Of In addition to the alternative ways of combining state spaces, there are also multiple ways of defining the spaces to be combined. We have jointly modeled the dynamics of all the components of a piece (e.g. pitch, loudness, brightness, IOI), and each component independently. We've also seen that we can decompose each component and model each of its subcomponents independently. In addition, we can reconstruct a state space from combinations of subcomponents derived from corresponding components of different pieces. For example, the large scale approximations of pitch from various pieces on one side, and their corresponding details on another (Figure 5-14). We could also take the high scale dynamics from a component in one piece and the low scale dynamics from another, or even combine different components from different pieces. These more intricate and experimental approaches may result in unique variations, but the original training pieces will most certainly be lost perceptually. 76 Musical Applications PCA i,[ n JV Figure 5-14: Corresponding components from different pieces are combined at different scales, but each scale is independent from the others. 78 Musical Applications CHAPTER SIX Experiments and Conclusions 6.1 Experiments In this work we have presented methods to generalize and extrapolate musical structures from existing pieces for the purpose of generating novel music. Thus, what we might evaluate is the novelty of the pieces generated with these methods in relation to the source or training pieces. But testing whether a piece generated by the system is different from the training piece is trivial. As we showed in the previous chapter we can generate completely new sequences with enough transformations, however arbitrary these may be. What we are actually looking for is that ambiguous middle ground where the generated pieces share structural characteristics with the training piece(s) while at the same time are as novel as possible. Clearly, this is as much a test on the system as it is on the system's user, since the art still is, after all, in the creative use of the tools. 6.1.1 Merging Two Pieces In the previous chapter we discussed three ways of combining two pieces of music to generate new ones. In this chapter we present the results of an experiment performed to evaluate how well each of the three musicmerging methods is able to generate novel musical pieces while still preserving structural characteristics of the original training pieces. Experiment Setup In this experiment subjects were presented with 10 excerpts: 5 pairs, each generated with a different method: Pair 1 Using method 1 with a single state space reconstruction. Pair 2 Using method 2 with a single state space reconstruction. Pair 3 Using method 3 with a single state space reconstruction. Pair 4 Using method 3 with components modeled independently. Pair 5 Randomly, using a gaussian distribution for each musical component. The parameters of the gaussian distributions used are shown in Table 6.1. Table 6.1: Parameters of the Gaussian distribution used for the randomly generated musical sequences. 101 duration pitch loudness p a- 0 0.25 60 80 0.25 0.5 10 10 We used two excerpts per generation method because each method can generate a wide variety of musical sequences. Using only one excerpt may give misleading results that depend more on the particular example than on the generation method. On the other hand, using more excerpts might have made the experiment too long for subjects to remain interested and attentive during the experiment. The two pieces used as training data were Ligeti's Etudes 4 and 6, Book I. In all cases the generated sequences (including the random ones) were quantized to fit the union of the sieves of the two training pieces (see Section 5.2). The reason for including randomly generated excerpts was to test whether listeners confused some of the generated pieces with noise. As we pointed out in the previous chapter, method 3 and to a lesser degree method 1 (both with a single state space reconstruction combining all musical components) generate noisy sequences. All excerpts were between 30 and 40 seconds in duration except the training pieces, which were presented complete. The experiment was divided in two parts: in the first part, subjects were asked to rate the complexity and their preference for each excerpt. They were also asked to cluster the excerpts into groups on the basis of 80 Experiments and Conclusions similarity. Subjects were free to create any number of clusters between 1 and 5, and their choice depended on their ability to perceived distinct groups. Evidently, the perfect ear would have perceived the five groups corresponding to the five generation methods used. In the second part they were asked to evaluate the similarity between each of the excerpts and each of the training pieces. The excerpts were labeled [Fl], [F2], [F3], etc. in the forms used and were organized as shown in Table 6.2. The forms used in the experiment can be found in Appendix C. Table 6.2: Labels used for the excerpts in the experiment form and their corresponding Pair type (i.e. production method). F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 excerpt excerpt excerpt excerpt excerpt excerpt excerpt excerpt excerpt excerpt from from from from from from from from from from Pair Pair Pair Pair Pair Pair Pair Pair Pair Pair 1 2 3 5 4 5 1 2 4 3 Experiment Part 1 Results None of the subjects perceived five distinct groups: 30.77% of the subjects distinguished four groups, 46.15% distinguished three, and 23.077% distinguished two. How were the excerpts grouped? Were the two excerpts generated with the same method (and thus belonging to the same Pair) actually clustered together? Table 6.3 shows the answer for each subject to this question. For convenience, the bottom line in the table shows the number of clusters found by each subject. Because all of the subjects perceived four clusters or less, there is necessarily some overlap of Pair types in every subject. Clearly, the fewer the number of clusters found, the greater the chances that two Pairs will be clustered together. Subject no.2, for example, correctly clustered togethered the excerpts belonging to Pairs 3, 4 and 5, but only defined two clusters total. This means that at least two of these Pairs were clustered together. Which were clustered together? More formally, if Pair P = {Pi, P2} and Pair 6.1 Experiments 81 Table 6.3: Were the two excerpts generated with the same method clustered together (1=yes, O=no)? Subject no.: Pair 1: Pair 2: Pair 3: Pair 4: Pair 5: 1 1 1 0 1 1 2 0 0 1 1 1 3 1 1 0 1 0 4 1 0 0 0 1 5 6 7 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 8 0 0 1 1 0 9 0 0 0 0 0 10 0 1 0 1 1 11 0 1 0 1 0 12 1 0 0 0 1 13 0 0 1 0 0 No. of Clusters: 4 2 3 4 4 2 3 4 3 3 2 3 3 Q = {qi, q2}, the question then is (p1 = qi) A (pi = q2) A (P2 = q2)?. Table 6.4 shows the results of this operation applied to every pair of Pairs for each subject. In this Table we see that subject no.2 clustered Table 6.4: Which of the Pairs were consistently clustered together (1=yes, Subject no.: Pairs 1 and 2: Pairs 1 and 3: Pairs 1 and 4: Pairs 1 and 5: Pairs 2 and 3: Pairs 2 and 4: Pairs 2 and 5: Pairs 3 and 4: Pairs 3 and 5: Pairs 4 and 5: 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1 0 3 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 6 1 1 0 0 1 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 No. of Clusters: 4 2 3 4 4 3 3 2 3 4 3 3 together Pairs 3 and 5. We can see that very few Pairs were consistently clustered together: Pairs 3 and 4 for subject no.2 as we have just seen, and Pairs 1, 2 and 3 for subject no.6. To see to what degree the clusters defined by the subjects were a combination of excerpts belonging to different Pairs, we asked the question: 82 Experiments and Conclusions Is any of the two excerpts belonging to Pair P = {p1,P2} clustered with any of the two excerpts belonging to Pair Q = {qi, q2}? More formally, (Pi = q1) V (Pi = q2) V (P2 = q1) V (P2 = q2)?. Table 6.5 shows the results. Looking at the Tables we can see the variety of the clustering Table 6.5: Is any of the two excerpts belonging to Pair P = {p1,P2} clustered with any of the two excerpts belonging to Pair Q = {qi, q2}? Subject no.: Pairs 1 and 2: Pairs 1 and 3: Pairs 1 and 4: Pairs I and 5: Pairs 2 and 3: Pairs 2 and 4: Pairs 2 and 5: Pairs 3 and 4: Pairs 3 and 5: Pairs 4 and 5: 1 2 3 0 1 0 1 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 1 1 0 0 0 4 1 1 1 0 1 1 0 1 0 0 5 1 1 0 1 1 1 1 0 1 1 6 7 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 1 8 9 10 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 11 1 1 0 1 1 0 0 1 1 1 12 1 1 1 0 1 1 0 1 0 0 13 1 1 1 1 1 1 1 1 1 1 No. of Clusters: 4 4 4 3 2 3 3 2 2 3 3 3 4 results across subjects. Figure 6-1 shows graphically the clusters made by a few subjects. On one extreme we have subject no.1 who found four clusters and correctly grouped the excerpts of Pairs 1, 2, 4 and 5. He/She correctly isolated Pairs 4 and 2, but clustered one of the excerpts belonging to Pair 3 with Pair 1, and the other one with Pair 5. On the other extreme we see Subject no.13. He/She found only two clusters, and only correctly clustered together the excerpts belonging to Pair 3. The excerpts belonging to the rest of the Pairs were divided in separate clusters. Subject no.10 also did a good job in clustering the excerpts. He/She was able to discriminate the noise excerpts (Pair 5) from the rest, but confused excerpts from Pair 1 with those of Pair 2, and those from Pair 3 with Pair 4. From our own observations of each of the three music-merging methods presented in the previous chapter, we were expecting subjects to frequently confuse the randomly generated excerpts (Pair 5) and those generated with method 3 on a single state space reconstruction (Pair 3). Adding across subjects in Table 6.5 we see that Pair 5 was clustered with Pair 1 by 8 subjects, with Pair 2 by 7 subjects, with Pair 3 by 9 subjects 6.1 Experiments 83 Cluster 2 Cluster 1 (1,1 02 3 Cluster 4 Cluster 3 0 5, 3 Subject no.1 Cluster 1 3,3 5,)5 Cluster 2 4),4 1,2 1,2) Subject no.2 Cluster 2 Cluster 1 ~ 1 51 ' 2,2 Cluster 3 3,5 , Subject no.3 005, Cluster 1 Cluster 2 (21 Cluster 3 Cluster 4 ((4,43 Subject no.10 Cluster 1 L3,3 Cluster 2 1,2,4,5 Subject no.13 Figure 6-1: Clusters found by several Subjects. 84 Experiments and Conclusions and with Pair 4 by 6 subjects. These results match our hypothesis: Pair 5 was most confused with Pair 3. However, the confusion between the gaussian noise excerpts (Pair 5) and those belonging to the rest of the Pairs was more frequent and homogeneous than what we expected. We also expected subjects to clearly distinguish the excerpts generated with method 3 with components modeled independently (Pair 4) from the rest of the groups. Again, adding across subjects in Table 6.5 we see that Pair 4 was clustered together with Pair 1 by 8 subjects, with Pair 2 by 9 subjects, with Pair 3 by 8 subjects and with Pair 5 by 6 subjects. Thus, the confusion of Pair 4 with the rest of the Pairs was also very homogeneous. Why did we obtain such fuzzy results? Of course, since we discretized all excerpts with the same sieves, they all share the same "outside time" structure. This common structure no doubt plays a big role in the subjects perception of similarities between the noise excerpts and the training based ones. It is likely that the longer the musical excerpts, the more subjects rely on these "outside time" structures (which are actually statistics abstracted from temporal placement) for evaluating structural similarities. Thus, it is likely that the sieves have a greater weight on the perception of similarity between the excerpts than what we thought. How would the results change if we used shorter excerpts? What if we had not sieved the gaussian excerpts? How distinguishable would they be in this case? These questions will have to be left open for future experiments. The less relevant but nonetheless interesting results were subject's perception of complexity and preference of each of the generated excerpts. Figure 6-2 shows the resulting means (bars) and standard deviations (lines) of complexity and preference across subjects. The most preferred excerpts were 5, 8 and 9, which belong to Pairs 4, 2 and 4 respectively. Even so, there is a wide variance among subjects in their perception of both complexity and preference. There isn't a clear relationship between preference and complexity, but its curious that the most preferred is also the most complex. Experiment Part 2 Results To evaluate how well each of the generated excerpts preserved structural characteristics of the original training pieces, subjects were asked to rate the similarity between each of the generated excerpts and each of the 6.1 Experiments 5453C2- 0i 1 23 4 5 6 7 ExcerptNumber 8 g 10 3 4 5 6 7 ExcerptNumber 8 9 10 5- 93- ~20 1 2 Figure 6-2: Means and standard deviations across subjects of their evaluation of Complexity and Preference for each of the excerpts. two training pieces. Figure 6-3 shows the means (bars) and standard deviations (lines) of the estimates across subjects. 6 S54- 39 2- 0 1 2 3 4 5 6 7 ExcerptNumber 8 9 10 1 2 3 4 5 6 7 ExcerptNumber 8 6 10 6 w5- 13- 0 Figure 6-3: Means and standard deviations across subjects of the similarity estimates between each of the excerpts and each of the training pieces: Ligeti's Etude 4 and Etude 6. 86 Experiments and Conclusions The similarity estimates between the generated excerpts and Ligeti's Etude 6 are fairly clear. With a relatively small variance, excerpt number 5 comes first with the highest similarity estimate followed closely by excerpt 8. As we see in Table 6.2, excerpt 5 belongs to Pair 4 which was generated with method 3, parameters modeled independently. Excerpt 8 belongs to Pair 2 which was generated with method 2, components modeled jointly. It is interesting to note that these two excerpts coincide with the two most preferred. How were the other excerpts belonging to these pairs rated? The second excerpt belonging to Pair 4 is excerpt number 9, which is the third most similar to Etude 6. Strangely, the second excerpt belonging to Pair 2 (excerpt number 2) has the second lowest similarity to Etude 6. How can we explain these results? As discussed in Chapter 5, in contrast to methods 2 and 1, method 3 is characterized by a dominance of one piece. How one piece dominates over another depends on the state space dimensions chosen for the projection of the novel trajectory. For the generation of the excerpts using this method we projected the newly generated trajectory onto the dimensions defined by the components of Etude 6. It is not strange then that Pair 4 resulted in the highest similarity estimate to Etude 6. Method 2 can best be described as a "collage" method because many times a new state space trajectory will be defined by one of the pieces for a period of time until an intersection of trajectories belonging to different pieces is reached, in which case there may be a shift from one piece's dominance to the other. The reason why excerpt 2 may have done so poorly is that the initial condition in the generation of this excerpt may have fallen close to the state space trajectory of Etude 4 and never shifted to the trajectory of Etude 6. In other words, this excerpt fell in a basin of attraction of Etude 4 and never left it. The similarity estimates between the generated excerpts and Ligeti's Etude 4 are not so clearly different. The highest similarity scores were given to excerpts 7 and 8, which belong to Pairs 1 and 2, followed by excerpt 2 belonging also to Pair 2. The lowest similarity estimates were given for excerpts 3 and 5, which came high in similarity with Etude 6. Excerpt 3 was generated with method 3 components modeled jointly, and excerpt 5 with method 3 components modeled independently. Which excerpt is the most similar to both training pieces? To answer this question we calculated the mean across subjects of the product of the similarity estimates between a given excerpt and Etude4, and the 6.1 Experiments 87 same excerpt and Etude6. More formally, let sim(a, b) be the similarity estimate between a and b. Then, the similarity between excerpt x and both training pieces is sim(x, Etude4) - sim(x, Etude6). Figure 64 shows these results. Excerpt 8 is clearly the excerpt with the most similarity to both Etude 4 and Etude 6. It belongs to Pair 2, and was thus generated with method 2. 25 20- S10 -5 I 1 2 3 4 5 6 7 ExcerptNumber 8 9 10 Figure 6-4: Means and standard deviations across subjects of the similarity estimates between each of the excerpts and both training pieces. It is not easy to state definite unambiguous conclusions given the results obtained. From the results presented we could conclude that method 2 is the most successful in terms of preserving structural characteristics of both training pieces because excerpt 8 scored the highest in the similarity estimates of both pieces combined. Yet, excerpt 2 (also generated with method 2) obtained a very low combined similarity to both training pieces. As we mentioned in the previous chapter, a small change in the initial condition of a state space interpolation can greatly alter the evolution of the resulting observation sequence. Thus, while method 2 may be an excellent method for combining two pieces, an optimal result may depend on the careful choice of model parameters and initial conditions. Method 3 with components modeled independently strongly preserves structure from one of the training pieces. Thus, rather than as a method of combining two pieces, it might be better understood as a method of modulating one piece with another. 88 Experiments and Conclusions The similarity ratings between the gaussian noise excerpts (Pair 5) and the two training pieces were surprisingly high. How is it possible that these excerpts were rated with comparable similarity values to those generated with some of our methods, particularly methods 1 and 3 with components modeled jointly? In our own perception, any of the excerpts generated with either method 1, 2 or 3 are clearly better at preserving structure from the training pieces than the gaussian noise examples. Even the excerpts generated with method 3 with components modeled jointly (which we saw in Chapter 5 generates noisy sequences) have large scale evolutions that the gaussian noise excerpts don't, thus potentially making them distinguishable. Could it be that we have overestimated our methods, or could it be that we have overestimated the subjects' ability to perceive music structure, or could it be both? Most subjects who commented on the experiment expressed their difficulty in distinguishing clear differences between the excerpts. Indeed, in general the results from both the clustering and the similarity experiments reveal poorer aural discriminability than what we expected. But the large variances in all the results also reveal a wide range of ears. Some subjects also talked about their experience with judging similarity. A non-musician said she noticed mood changes from excerpt to excerpt, so this was her main criterion for judging similarity. A professional musician said that he noticed his criteria for judging similarity changed from pair to pair. First he started rating the similarity between excerpts without thinking much about it. Later, as he rationalized the task, he realized that his criteria for similarity were many, and that the weight of each criterion changed from excerpt to excerpt. Sometimes some features were more salient than others, thus influencing more his decision about what was similar. There are clearly a large variety of perceptual approaches between different listeners. Different people focus on different aspects of a piece, and therefore judge similarity in different terms. Each individual's perception of the musical examples given is also influenced by their experience with music similar to the pieces used, and with music in general. It is noteworthy that subjects no.1 and no.10 (who had the best discrimination results) are professional musicians. In addition, subject no.1 was acquainted with both training pieces. Indeed, two Mozart sonatas may sound very different to a listener acquainted with his music, while they may sound as the same piece to an "untrained" person. Humans, just like machines, are trained to perceive the world in different ways, and 6.1 Experiments 89 the training data individuals feed on can be quite different. One man's music is another man's noise, and we see this in the arts continuously. For humans, too, perception requires learning. Experiments and Conclusions 6.2 Conclusions and Future Work In this work we have presented an approach to the problem of inductive music structure modeling from a dynamical systems and signal processing perspective which focuses on the dynamic properties of music. The point of departure of the approach was the reconstruction of a state space from which to obtain essential characteristics of music's dynamics, such as its number of degrees of freedom. This multidimensional representation allowed us to obtain generalizations about the structure of a give piece of music, from which we generated novel musical sequences. We presented three main types of generalization strategies (Chapter 5): State space Interpolation, Abstraction of the dynamics from the states and Decomposition of the musical signals using wavelets and PCA. We used local linear models to model the reconstructed state space structure deterministically because of the method's ability to generate a variety of transformations, its speed and simplicity. There are, of course, other ways of modeling the state space's reconstructed structure with global methods, such as using polynomials, neural networks or radial basis functions. Our explorations of radial basis functions were brief, so we did not document them here. Nonetheless, it is worth mentioning that we abandoned this method because, in contrast to the local linear models used, the newly generated sequences using this method tended to fall in attractors and thus generated infinite cycles. In addition, unless one uses almost as many radial basis functions as points in state space, the resulting interpolations are smooth. Therefore, this method is not very useful unless one is dealing with continuous sequences only. Other modeling strategies remain to be explored. Of particular importance is the incorporation of stochastic methods into the overall system. We discussed the tight relationship between states and dynamics, and how a collection of states only allows a variety of state space paths (dynamics) and vice versa. We demonstrated the use of rotations of the state space to generate novel state sequences that preserve the exact same dynamics. However, we did not explore the reverse: keeping the same states while changing their temporal relationships. The classical (and probably only frequently used) transformation in this respect is reversing the sequence, but evidently other reorderings are possible. These remain to be explored. 6.2 Conclusions and Future Work 91 By decomposing musical sequences hierarchically we were able to achieve an additional level of flexibility that allowed us to perform transformations at different time scales. We discussed two decomposition methods: wavelets and PCA; but we did not explore the more robust family of transformations known as ICA. Particularly for the case of music, where nonlinear relationships frequently exist, PCA offers a limited solution to the problem of obtaining non-redundant decompositions. Thus, future iterations of this work will explore more robust decomposition methods. Related to this problem is that of stream segregation discussed below. 6.2.1 System's Strengths, Limitations and Possible Refinements In our interest to be as unbiased as possible towards music, we have tried to make our approach to music structure modeling very general. In doing so we have payed the price of generality. Many of the sequences that we have generated are rather coarse approximations that frequently miss some perceptually important characteristics. As we stated in Chapter 1, it is really not possible to model all kinds of music with a single approach, so while the approach presented here may be a reasonable point of departure for any musical piece, the particularities of each work can only be faithfully captured with more specific and more refined methods. What refinements could we implemented that would still be applicable to a broad variety of music? We presented methods of decomposing musical signals hierarchically, making it possible to transform the details or the large scale approximations of the dynamics of music independently. But we did not discuss or attempt to extract repeating patterns such as "themes" or motives that could be treated independently form other musical materials. Thus, a further refinement of the system would be to try to find these kinds of structures, and to segment the music in more detail. Another more ambitious task would be to segregate streams that may be running in parallel in a single musical component such as pitch. Musical multiplexing is quite common in monophonic pieces, and the separation of perceivable streams or voices would allow greater control and refinement of the possible transformations applied to music. While we briefly mentioned the possibility of modeling the dynamic properties of sieves, in all the examples presented and experiments performed we modeled the sieves as a stationary structure spanning the entirety of the piece(s). A more refined yet still general modeling approach would consider sieves not just as stationary "outside-time" structures, but as dynamic ones as well. Experiments and Conclusions As can be seen and heard, the approach to music structure modeling we have presented is quite good at capturing the large scale dynamics of music and generating novel sequences at these large temporal scales. But due to the filtering made by the local linear models, we have had to sacrifice some of the high energy dynamics which are so clearly perceivable and thus fundamental, frequently making the short time-scale sequences sound unrelated to the training piece. 1 Thus, we have a reverse tradeoff to the one typically found in Markov models and in some musicgenerative applications of neural networks [29]. A possible solution to this problem might be a more in-depth use of hierarchical decomposition of the musical sequences presented in Chapter 4. Because the high frequency content is removed from the large scale dynamics (e.g. wavelet approximations), we could safely use many neighbors in the state space interpolation without causing any additional filtering. Meanwhile, the state space reconstruction of the details would be interpolated with very few neighbors to avoid filtering high frequencies in the manner we have discussed in the previous chapter. By combining these two models it might be possible to obtain good small scale variations as well as large scale ones. 'Remember that, in order to reduce the filtering, we typically had to reduce the number of points in the velocity estimates or increase the exponent p in Equation 3.8. 6.2 Conclusions and Future Work 93 94 Experiments and Conclusions Appendix A Notation x x s[n] s [n] s(t) s(t) Scalar variable. Vector. Discrete scalar signal. Discrete multi-dimensional signal. Continuous scalar signal. Continuous multi-dimensional signal. (a, b) abT Inner product of a and b. Outer product of a and b, where a and b are vertical vectors. ||x -y = C (M) = V/(Xi - y1)2 + ... + (m -ym) 2 M!2"2 C2 Two times continuous differentiable function. A V Logical AND. Logical OR. E[x] The expected value of x. Notation Appendix B Pianorolls B.1 Etude 4 G7# D7# A6# -w 114,1 F6 so C6 G5 - . ~ , '~" !U!!!!H''" ; E4 B3 F3# c3# -il j~f, m|HN I " T ol IsA P 1111!11IIIIIIIIIIII """" -....... 4Fn' Is I 11 'F ll lil so lll* 111 -1 . .. ..... is.'e!i'.rl"l .. .... * .'. 111 II a 11'112 G2# D2# I s " D5 .c A4 it IIIIS''I'to''''C' "~ " '* -. I I . 'j. N6:' A1# U~4l F1 C1 300 400 500 600 Time in beats Figure B-1: Ligeti's Etude 4, Book1. 700 800 9 B.2 Etude 6 B.2 Etude 6 99 C8) G7 E6 B5 - D4# II G4# %.\.- 'P- . -v A -A G2D2 - 0 " 50 - a 100 150 200 250 300 Time in beats Figure B-2: Ligeti's Etude 6, Book1. 350 400 450 500 B.3 Interpolation: Etude 4 Interpolation: Etude B.3 4 Etude 4 B.3 Interpolation: 101 101 dim=2, nn=2, p=16 dim=2, nn=2, p=2 dim=2, nn=2, p=64 FS# - G6 Ce . S. G6 AS# - - if, 1HNNN~i N - *8 A5 H H G5 E5 . N I B4 - - - CLC"i e G3# -. c2 .11- . ae0a -Ae A2# . .4 . . F2 100 . . A3 ! F30. E3 - C3M - I s . .N . . 02# 50 1 Timeinbeats dim=2, nn=2, p=128 GU - -7:--ei'n- G2# - C#0 150 y E4 . . F2# 50 ;S, D B2 - G31 . .. ..... G4 -- ..- 50 a 1 10 Timeinbeats Time in dim=2, nn=3, p=2 dim=2, nn=-3, beats p=16 D7# - D.1 .1-:. F6 -- A5# n a F6 Y G4 -49 E * / 1 1 , , I, G5 E4 w J E4 . B3eib- - . . A4 i e - F3# F3M F2# C2# .'. ets A3Tie E3 B2 -, e a - C31 . G1# - G2# -- D1# - D2# - AD# 0 0 % DM f AI# 20 4() 60 w0 100 120 140 M& Timein beats S. . dim=2, nnm3, p=64 20) a. 40 0 80 Time in beats dim=2, nn=-3, 120 100 P 0 A1# 14() 501015 Timeinbeats dim=2, nn=4, p=2 p=128 GSM-- G6# A50-- AS# F5 - C5 G4 E3 A3 a 82 D43 E3 813 F2# 8 - -ff F2# - G3# G1# -- G1# - D1# - 0 - F3 20 40 60 Time in w beats 100 120 0 204w0 Time in beats 8 0 2 0 20 40 w0 80 Time in beats 100 120 14 10 Figure B-3: Selection of generated sequences for different model parameters. The training piece is Ligeti's Etude no.4 Book1. w LiJ dbe=2,nra=4,p=16 AMl S, FS - r IN et 0 ato .. o . , N No I A4 N I- 5 of I.. I. No No E4 a-. 0 FM mo G2# M D2 4 20 0 10 4 2 Time inbeats 0M dim=4, nn-2, p-2 dim=4, nn-2, p-16 dimn-4, nn-2, p-64 D7 GS DB AS- A7 E7 S : FB G m N 6 bs G5# DSJF F4C4G. D3 A2 F~S. inbeats E6m . 80 40 2I 1 1. 140 2 1 d E2O-" GO# Tm id- 1.a F ES D AV1.- I .. " G2 =i ." I DO# 's 1 *S' 0 100 200 300 400 500 00E) 700 Time in -- -- - beats dim=4, nn=4, p=2 -of - I . q .F D7 Ell D2r 0 20 M w w Time in beats 40 1 120 .. . S 0, 20 o .V .uII .1 Aas 10 40 Ti 1E a bt Timeinbeats Timein beats 00 0010 dim=4, nn=4, p=128 dim=4, nn=4, p=64 D-'# F60 dim=4, nnw=6, p=2 - ii, GS n !6'a 05 h .*"v s" "4 i E4 83 F3# G2#D2#- AFS op 10 50IS Time in Timein beats Timein beats dim=4, nn=6, p.64 dim=4, nn=6, p=128 beats dAm=4, nn=6, p=16 C7# - D7#- - G7 I', t. D7 A6 Ca' A5#- Vt E6 - -- . G4 ; -"- ! est "'ii - F5# " -. AS A3 5-' ".. -. G5 - -- E44 D4# F2# F2#- F3# F3 A .% -. SG49 -h .I. E3- '-I . - c3 - Gif G2# - G2 D2 DI# -050 10 Time in Timeinbeats 150 A1l SO 4w aw Timein beats dim=4, nn=8, p=64 dim=4, nn=8, p.=16 dim=4, nn=8, p=2 20 beats G7 D7 - I F7 - A60 C7 - F6 B3S AAS- *6 -6-% ' G3O - 64pi G5 Pa -of - D2# B4 Timeinbeats - A14 D4# A2# F2 0 20 40 Time in 60 beats SO 100 120 0 2O 4weo Time in 6 100 beats 120 Time in beats CA4 c+ dim=4, nn=s, p=128 dlm=6, nra2, p=2 D7 CDA dist6, nn=2 p=16 0 G7 08 // , . E6 . */ -//F Pr A- --. Z&O0 7/:6 .. F3 - . . -111. as F31-$F F3 G2G2 D2 - 40 - 60 - Timeinbeats 0 120 - 100 D7#' - ' 3 ~~ . 84 p.128 dim=6, nd,2, p64n-2, E4E dtm6 niti, =16dtm=,8855 dm n 5 p=165 p=2 =64dim=, nn2, Duf - I, G 9u Jp 0 -. *eF - FtAB De c 20 32 at 82 10 . 20 0 Timein beats G2#6 d m6 nn2 p=2 .... 03 q 03 F61 F B3 020 - 20 40 JPe - , - . 6020 0 lTimein beatsTienbat dim=, nr=5, =16dim=, nr=5, 40 DS 2022 1 0 120 aA E4 214' GS 0 . 20 40 0 Timein 83 -F3-. - . 0 E5 A0 100 . 0 beats Time in -C "/ "0"3 -ED4FfP $, c " -Fa 40 2- FU 827A A6# - -CF2G8 -2 A5# G5# # Al#~~ 20 40 c2Al[ S0 80 10 120 beats B4 0 - - - D6S E4~F B3~~~G - A5 A4 0 aa C7 /l . p=128 - I3 03 F3 G c62' 1202 =64dim=-6, nn5, F4 F6~G - 100a - ,a,/5D3..: 0 20 40 OD Tim i betsTimein beats 80 100 120 20 4 0 TimeIn beats 0 10 6 0 10 %2 At0 f 6 dim=6, nn=8, p=1 dim-6, nn=8, p=2 F7 C7 % G6 D6 e E5 %, dim=6, nn=8, p=64 a' .- % * s / B4 F4# - - - IO r 0 0 B40;600 1000 1200 140 1600 1800 200( Timeinbeats G7M-- E3 W1 dim=6, nn=11, p=2 - dim=6, nn=11, p=16 EB -GM B7 - - 8 D6# A# F7# -C7# A7-- G6t - A2 - -- Ft 7%1 Timein beats E- -? a C5 D60 - dim6S n= 1 p128 D4 %V Y t :.-1,/ F5#-o A4# G4 e pt. C5-s a - 86, A3 e inbet E3me 7 B2 , A2 F2# A3 F1# F16 - - -- - - 0 4 Timein beats 20 - ,31 40 0 Timeinbeats 50 dim-6, nn=11, p=128 G5# 1010 Time in beats dim=8, nn=-2, p=2 D5 A7 E7 - B6&e6 A4 C64 G5# - F0 F6 - .r''./ C'.. . A2. be G3 -e D3 A2 A4 E4 a 20 a i es a e a j Tme in b as a E2 E2 - F1# ef a C1# GM 0 0 w 40 60 Time in w beats$ 10w 0 20 40 80 80 Time in beats 100120 140 o om d sivequl ouwU Oa09 001 DO DP sleoq II 9901g Oz lea AC 'oI ' Io I - a ~p .-.**~ . -6 ,.P mwA~ ~ e - -j ola IIr - I5 5a .& . a Fg .-. .. . . *. 1.. - 9 0" e Mtr .1 -- le 5. -0s9 . %A:::::::: N xl- -6* spe9q uwem 9t' d '-.=u '.2 ip - st or 5Lo ol - - "oe 000 -'sa 919909 9-WI 900L0 9W I u '0u9 At ooo - uIs pseuun 1 e. ona me W - -o 0 .1 ik ;P? 00 .'P1 v sla IIIow0 W 09 .. i ~.;it 00, 00 A .K# Bnod 'g99u '9=u99p "a - - a..4 . - Mo ~cc ' ~ p ;9sa :* '~- 0 z ~* ~ ~ 5 -.. ga s/ , 9 ; *.ra .qq ls L * Aa40 50 Mt~ MI*99e 4*Ysw =d '9=uu 9=9mp M;5g MA 0 aaIa a3 r.. ' . ///p -- fld'EU ~0 - - OM " .- '=o go'% . 9g-d '19-u '9999059 I. .* . . . . Ig~ . -" . "il " - - II, 'muu'eum dim=8, nn=10, p=64 dim=8, nn=14, p=2 c73 A6# -F6 C7Ge jiiiii r o %%t.3Nd3 - G5 F ..- Ga F4n -. .- =m G4 in ba E4 ". O~d "':,wN I V.. F5 A4 /e' - V" D4 . G1 01% I" A3 - DS E3 B3# -- G2# - A# C0# jj/1* 2D ., w0 40 Time in dim=8, 8 0 14 2 00 20 'Y 8I 1p %v. -P .K I010 40%- D-Va .N. es a '% I 40I0t010 Timeinbeats beats Timeinbeats so fillioll dim=, nn=14, p=128 dim=8, nn=14, p=64 p16 nni=14, 20 W F20 D2AD 0 .N,1'IV T .du F I : C7r II IN FAM- C6 - GIt A21 - I IN IN A4 E4 l INI - G5S IN t IIIIED A59 I A3 G4 - Dc . . # IN I a - j 32- II G T 0 20 40 Time in s0 beats Timein beats :%%v. INs F3# - pen1,p=2 .N f F4# GF Illil/ilil F5- tii . 80 t00 120D 0 20 40 as 60 Time in 100 beats 12 Time inbet a 1 B.4 Combination of Etudes 4 and 6, Method 1 B.4 Combination of Etudes 4 and 6, Method 1 109 din-2, nn-2, p-2 dim=2,nn-2, p=64 dim=2, nn-2, p=16 - D7* C7t - D70- AS,F6 - Det :o VA'I m @I.in IIN. 1 1 1 o N I I IN Is - A4 IN -. N ., p No .. n . -iISS t car ae tc 1 2e im i nbas dim=2, nn=3,. p= 3- D a N I Ls... Time in beats . IN N s I I N - I F30- o in dimn-2, I n=, i ipU29 -o IN N I At ~~ ~ ~ ~ 10 5 15 dma . sI e nn4 p=2 ' IF 2 25 Tim in beats :I . 30 35 ,1 I 4 dim=2, nn=4, p-26 , C3' 1 dim=2,nn=3,p.128 G- n o2 p.16 :Not 0 - F4 o./' INI :' s :.w I.'s I N IlvN. d A6sI ON-IS EN N 2 N s I O dim-a, nn=3, nn-S,p=648 dim=2, p=128 - No 331 25 30 35 4 IIo,4, NI M. - -E3|4 - a dIa nn-3 10 I ~ 323. 5 ON. E .. o C6 . a3 ,o F5 I -o n s G4 F. dimA-2,1 .. * ON :" .:i, eI. A7# - A6 I. F6 N .I-N E3 -6 cT.. I - N . p FS 5s t - 15 2 mN 3 3. . car 15 10 5 10 20 25 e oas-i 0 .- N I .. I I i iq D3I 4o 35 3 0 II - I -N, -E --- - CTime D :II 111tiie 5o D1 - a I E3 E43 4 0 15 W 5 Tim in beats 4s inbeats 3 y- . - . . . .7 l. Figure B-4: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 1, with 20% for Etude 4 and 80% for Etude 6. Each panel is a combination of the Etudes using different parameter values. din-2, nn-4, p.16 dlm.2, nn.4, p.128 dim-2, nn-4, p.64 D7#- D7 'A - - - 10 15- M 2N2O0 35 4 ,...- NO ---- F3B 5 ** - . F4 D1 -. 5 r . I . No . -6 F3 s F6, C - e., -,IN AS.r 0 5 1. 15 20 25 30 Tim in beats 35 4 45 : - I Noi n beats dim-4, nn-2, p.2 01 10 2 40I Ta. c 21% F6-_1 r -..- 02 - 1N:I J1/3 . NI -i E2 'IF imeinb' ". 1 Timeinbeats i l F3 r ",r.. ddimn-4, oM-,nn=4, nn=, p-164 p.64 4:4 AAW E4 -. .- n. . y i. r .1 . .s. ,...P dlm.4, nn.4, p.2 am 1® Fl - 1 ,s 0 is E4 - - - Timeo in boas 03 10 - 20 .. - . - 30 Tim in ./ ,.- 40 beats a - p 5 .. dim-4, nn=-, p-2 din-4, nn.4, p-64 FS F4 m, .. . D MM - - dI n-,p.2 AS,4 E4E 4%- F. - inbeats- 5i 1. S 20 E4 3540s Tim in beats dim-4, nn-, - i dim.4, nn=S,p-64 p-16 F6 - ce- - a. . 9 4 - ., ,, S y . E3 *--6 - . . ..- .m - ,- S,. -V, - ." F.'.. . '. ,... E49 . . AO. . is- .. - - - a I B .c ...- , - . * F3 0 . 1o -'i an 4 2w C1O w 0 Tim in beats E3 - dim-4, nn-S, p.2 10 50 2 Tim in beats dinm-4, nn=B, p=64 dimn-4,nn-4, p-16 F6 -- F3# C2*# "I - - :.:-. A -A"-. e-A-e. . aa 9v. . . - ./* * Y -*9 w - .. - O 10 20 -' 30 Tim in beats 40 . 5 Tim in beats w0 . * s1 . I a-3NA' 1 L BACobnto t . .P p U..,.. S 2 OOFI- P . l . P% M.2 -* t22 q f 21 .~. .~4 .2 tds4ad6,Mto 4Dd .'. ~~.J1 .. ~ ts **U. '1 ~J e IL 34 . Fl2 * A. l . ZL R.P . h8 . . . I 1 dim-6, nn-B, p-2 F5 .. . %X 1m F5 in din=6, nn-B, p-16 a. A5r t F5 - E D4 ii j'.: "N . i i sk- il As C N - V. -. c5 .I iSInnEi . .. . 1 dim.6, nn=, p.64 ' 4111 . F.r I .. 5,. ..:.tiC- Y"E . F4 :- .*.-. 6 5 .0 0 .. M. 3 2 il - - al(a i Is Gd4,B In W Pit15 2 5 0 3 4 sm . 0 F5 0 5 4 5 5 A,., E2a 11 5 . * . 00 50 .5 ON -7 -i % F2 I S . .dim-B nn1 .p=2 A" 7M sm -p 5 10 15 20 25 0 Tim in beats 3 4 110 45 is .m Tm a inbeats *1 II-BII dim-6 nn=11, p.64 dim=6, nn=11, p-128 di m-8, nn=2, .. - p.2 Dalg E5 -- GMa D7#F5S -S -I . FZ6 G5 DS . No shm i. ~'.y :mrf *OAN, - .,- - . .11 -- 0 I E3WE . .1..a G2 i.tg s g E4 ca%?. ' "F.% e " lisi G1 . si is. .- i Q~~~ri . E3 0 2 0 40 5 -I.I - 5 D1s 0 7 0 0 15 2 I 25 30 354.4 Tim inbets Time" 0 10 20 3 0 Tim in beats 50 0 l F5h - F21 40) Tim in beats a - J 30 u Tim in beats .t 20 ' dim-B, nn=11,pl16 - par- e p-2i%1 - DA N. ON * - .- - B2 " . \ G1t - C7s 4--- W -isesu F0 - inbes -m NOE'B N OW c - - - - . -6 - -I - - - , -. tPPd '. OOd * - ---.- . - -3 - - -6 -- - 4 - - - A.K - 882 ....... - - . -. ~. -. EC 2-*.-. .~~~ C 'L ".- JSJg, - . 52 t ........ Nt 4/ E)NI??N * - - - *.. -..- .. 2F F 8 404d .. , %.a. /2 ~fiiff aaas WLa. .....9 ... :.. F8 2-.--.-.. B.4 Combination of Etudes 4 and 6, Method 1 - - - . te ..- * j - . - - -. 2 '621 *. :: -" - 2 2 .-. -. - N- r- -9. fii * AtL' ''' - ,3n. EL... -a% B2 1fi ....... - -s 115 dim=8, nn.=10,p.128 dim=.8, nn.10, p.64 D7B-o E5- t .I F6 -- *1, CSe G5"" toy ,..a ... E4 v ,i Ea 01 E6i .,.% 0. w*so lS.% - A g W a,. di3,-S, nn-14, p-i,* *g dl.,,=B, nn.14, p-64 Al -*- 2I 4 C7# 0 10 20 3 0 0 10 - F5b Gen- - as, - 01 .2t - Ca .r - M07 50 Tim in beats C76 D5 SO 3w 2M *2 tdim-8, nn=4, p-16 Co. D2 Is 5 w 0 50 Tim in beats m. % 8/ . . a.. -A .am to..", . . . . c .. r- e - Mo. . .. -- E4 , - -. Def."=el " '* .a .... .- f. o ".s , -.. -- a 1 . "6 1 t- - 'a dim.10, nn=2, p.6 . As F, E9~ N GO 0 10 20 to -. t. 'a'. D. , .1 & 3M 0 Time in P.3 5 beats 0 7 t- o dim-10, nn=7, p-16 dim-10, nn.2, p.128 c5, -as. o .. . -' . - s .. ...... *2 - aP. . - - .6 F0 Yt. W r3 .- ." - - .. .- .. c3 . a . 6 . .4* . . * .. I -ia .* .2 a. - . e .. ,a ." ~ e- . . s 05 . .- A oF or a i . . 0 %1, ~ ~ . ~ "c .s .. A "".L.-.am,: A d~~m.10,nn.7, p.64 7, p.124 1dim.10, slo dim-10, nn-7, nn, p-128 i-0n-I p2 O.. A.. ES5 so1 - IE~F 0 10 2w s S41. 30 c . ,.r Gt M so o" "G. 0o F.. F2 J.. no 0% .- .D .I . -s Im -4o2. 10 I D. . . .F3a El c2 a 3 F3s- -m DFI - (0 Tim inbeats ri nbasTiminbeats dim-10, nn=12, p-16 dim=10, nn.12, p "6 dim-10, nn=12, p-128 .. 3 . !6* .P a..., s s. 4-- ... 1. . i. - s s -6 a1 -; F1, 10 20 3 Tim in beats Q 4ws o 10 2 4w Tim in beats 5w 0 10 2 0 Time in 4 beads 0 d dim=10, nn.17, p.16 dim=10,nn.17, p=64 Es C7 s, . * ' -I . -""" ... . .. ?- . a . .,, .... - - . -w E5 . .. e, . '':. . . *,. F.... ** ... a - " F2# Al G2 * -a, E1lG1- o - Tim in beats |-im inbeats dIm=10,nn.17, p=128 i* >s . ."b .... A . - . 0 P.. %.t. -. . E5*t en ' - - I - -. * *. . - - B.5 Combination of Etudes 4 and 6, Method 2 B.5 Combination of Etudes 4 and 6, Method 2 119 dim=2,nn-2, p=2 dim-2, nn=2, p=64 D7 As E6 - .... 61 * '.. I, E4 * v 1 - A1l 0 5 10 Tim in beats 28 15 20 25 Timle 30 in 354M beats 5 S 2 din-2, nn-2, p-1 dim=2, nn=3,p= dIm=2,nn=3, p=16 F6 C7 - a, 1-7 2 3 4 I.5 "-- * ............. """'"'"""""""""""''""""""""N.H..H.H C-......... a AlA s'a Far ~:VI'sjj1flp*~r C7 02 j4, dim=2,*nn 3 -6 . F6 - ". -. .- . . .. ., . *- ,. I - o 2 10 30 ao so .6! A3l Timne in beats a -'*-. . . - .9d- - . Timle inbeats E4 - so ,pj4 a- - a ;. . "" . %.-- - -- "--j.l- ' E1 Figure B-5: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 2. Each panel is a combination of the Etudes using different parameter values. LPd '01P1 WiPd & .%I -C- d LPd - *A 43Pd - - r . B.5 Combination of Etudes 4 and 6, Method 2 ' LPd Wid 121 dim=4, nn-4, p-128 dim.4, nn.4, p-64 - - - U E6 I- '- I E4 - dim-4, nn=6, p-2 07- * N- o I- s. s-b 10 S 20 o 40 5 go r Fe. Tim in beats dim-4, nn-, dim.4, nn=6, p.16 : Gol p-4 s. s..m; I i a :1 1 nn-6 ."* pI dm-4, :;10 . L !. o 30 I I. I M s F1 dim-4, nn=8, p=124 n.-6 p-i E4n4 r7 - .. . - E4 r~ A- It 4 w Tim in beats 2x -. * - . - - ;s . F1 S. s inbeats Tim lol yIs so 4 -- 70 O .. 30:w 2± , - Vto I w0 7 dim-4, nn-U, pe121 dim-S, dim-6, nn-2, p.128 1jlame - I0 p l-www ao beats dim-6, nn-5, p-64 I I I mlfI I " 1 1 FO m = -3 m o 4jl 10 - w M w Timain beats ps ,. ml, 1' M X.. o/ O . F, nI 6omm ms.rn ao TimneIn / an mc ~ as. /~' p.2 ,a1 a *. 9I _0 10 . *4- : Mo 1*' :of. %.;f~cm / nn-5, Ut as 1 t Pon* 2 din-, - 7 E7 I dim-6,nn-8, p-4 dim-6, nn-S,p-16 nn-B, p- Is,, i.. B- Dar- - ' asas ma F4 vE E4 --. yy& 1 mt cw a; IN, F. -11-" a E2#- I- car Ifili BII - Tirn inbeats A, dim=1,nn=l, p.12 - C4 dim-6, nn-11, p-16 D3 C7 Es U - -. s -i -% I1F din=, nn=1, pm2 m 06 -c'- A1 F11 rr 1 2 8 o3 I N4 o o o Timni bat 7 *im Unbet rn - -----Fm Tim inbeats Tim in beats dim.6, nn=11, p.-64 Fa dim=.6,nn=11, p=128 - dimna,nn=2, p-2 D5 -1 -- F4 di.,nn2 ~ E.-~1 0 0 ~ 20 . Tim in ll 1 30 Tiei ats beats I - 4 0 0 E2- o go ct Tim in beats 10 ins 3 4 a I Al. I- - Heat-pil lil -asa- .. 0 - - - - . fSill - .'- ---- Is Wirtl.F! I isi 'r'nup pd '0L"uu 9L-d 'ot-uu 'i-'wp * I a. : s e r uten sm *1d 1.. 1:.. ...- pji I : ':Its I o I Isu! '1-16P5 'L-UU , - * -- -I I m- ";I 161 I ;;-d 's-u. '8--Ip l a e-d '-uu 'g--p * -- r Sieeq U1 ema Isi .6-d - 'E , 's iI ;; I.,t...46 IV' a.;i!1iUIu~, ;. w -U ; .". . /s. : 7 jpLL r - . I'-- 1'g1-sw'll 'z-u ML-d I I *b rL." BELid'vuu '5-wip * ~ I~ I-. : ~ 11o . n.. NEW.ii S IHS A *l I :: *I N pil I 1 -- NI *ua jjl 9 I o s N o o 6 o -I- spe uti m aLd '-uu z-d z-uu 'oi-p 'at-uip ,..- -_-. --:: ---o,---y u' md ' - - Am ~ En.e.U.U1U:--A:&v:E. ;|-o *5no *1100 . - * zh.~oD r * rfl6:.: yg-d % sleeq ewg u! -. 'yL-uu spesq - 'gLmjp a jI J euLL u! " ! * * if 4r r0 :1 11 o.N .f.'N in-V.,. Ut-U ~ j,; pg-d' L=UU '9=WIP -d 'PL~uu'-wIp 9 ~U:U a as . OLA09 0Z p-u -~ - ol iiu11I! ;%16 f dOL~ I/ 0C p m1 Zw O --i pg-%? oofalm z dimm10, nn-7, p.16 dim.10, nn-7, p-2 dim.10, nn-2, p.129 -i - - F510 II E5S II 1 07 - Tim inlbeat INrCNi'? F4 E - .. *-ol a 10'20 1aa cuy I so a - Tim, in beats . dim-10, nn=7, p.64 . dim-10, U . nn7, p.126 . .; . dim-ia, nn=12,p-2 84i Coo .4f- o I -. 1S - M :I as.16Ndim-la IN12 No F 1UX - jf '/ff, -j/- 1 I, l so 2111 011 1 nn7 .6 m%5. 0fA Jl N No l - * F3 - : - r 10 im-la ka1.pS Tim inbeats dim-10, nn-12, p-128 E4 Ww , Tie .. inbet , F41 - : - y - -- - D3W 0 E3 - la - id 10 20 3w Tim in 4 beats G in- Mi :Ma s5o I l .I .. .t -qoL ;- -- o 10 2 wao 4 Tim in beats .o --. .. 116 sleeq U! Il El~~~ ()411 I BZL~d 'LLUU I I I I IIIi 'ztuu 0~*E O'Imp I Ld 1 4..ra :rm BILd 'L-uu 'O"'IP K6;. ANIU. gj-d 'zt-uu Z-d 'LL-uu 'OIWlP '0L-mIP B.6 Combination of Etudes 4 and 6, Method 3 with Components Modeled Jointly B.6 Combination of Etudes 4 and 6, Method 3 with Components Modeled 129 Jointly dmm.2nn=Z,p-4 dim-2, nn=2, p-2 dim-=2,nn=2, p=S I AI as' as . m : ass=2 jiu8m F 4 -&Ao 0p.64 i ts=2 as - -~ km. mm *as ask tZi~ii dim1, nv=, p ff - aa I 10 Tim", inbeats 40 I . :4, t 50 Ti i p=24 ;W beatmm dlmm2Zms=3,p-64 dIm-2, 55-32, p-14 #a m adim=2, nn=3I, m S im Tmime inmeat '' F5 74 mm -~~~~~" 3m mm -& DG2 - G4 D2 a Is~~~r. fid m m "dd~mvmmm ' m r La.. '* m m~ -Z m' F6-.. so Tim inbe ts o ' as' - - . 0 10 2x 30 a Timin beats Figure B-6: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 3 with parameters modeled jointly in a single state-space. Each panel is a combination of the Etudes using different parameter values. diim--a, ne-3, dIm-a, nn-4, p-2 p-Sd som P e .;.n p-4 us e * - nn-4, dim-a, As M :Ka I No ti I'3 .. * 1111 & 7 A 1V6. * seji. IN II of,4 dim-2 p-ed .1 N .I .1 .'5 IN. I A- I illeW In IdA* :;a r3 I83 1 0 IN 1 *. A hS% ONNi..m 3 -I. 1 l s on DW- .k *ass. All; w *0 or M P.3, V . CN i~ Sqt InIs Y '5 j~ ~ .dIto1 dlm-2, nn.2, p-14 istV t ::.. ~ A MM a, a A, oU - ~ * % V. -a0s:. A1 &Jn.. % A * ~ 10 2w 1inbeats Time - Uninberats dkv2nVp6 * Ss 0 f Us o" At,~ dIm., dim-4, nn=2, p.16 dim-4, nn=4, p-2 c& INNoNom No..R -O F-F6 p.64 n.-, 19~~m w fhr- v r --- O n oI r -o. - Ba - - - o S. E4 -o 15 20 30 40 dIm-4, nn=, p-4 1 15 No 2 5 dim6.4,nn.4, p-16 dim-4, nn=4, p.8 AF 30 A 3 - -N ; I I 0 It - So .12 aM 5 'I 1S - F6 - . as -- E4 p .. %A 0 10 15 2 5 3 5 7%. I .. N I I.IIN .6'i n1 - . 16 20 31 Time in ' 4 5 0 2 2 15 4 07 50 25 Timea inbeats N 0 inibeats 10 53 -A---f EO 1 II ON1IVA NmATime - Ut!'r o aM Noo - beats 1O ONN 150 4 rN 0 ila ;A - , -- -s ION a.-N - - usNO - - N F4 FM Tim in beats o II ' . -- l I dim-4, nn.6, p.4 D5 1t0 20Y 30 ~ ~ ~ E4 4x MnTm 4050A bea.0ts 0 o i dIm=4,nn-, dim-4, nn=6,p-8 dim-4, nn=6, p.64 p-16 III 'h a " - d E II 0 a, .. F. G1a- - . a ~;m ssm -r o olo m imrm~ - - o - 111 0 P U ~ ~ U :.M.~~ fi8NI 'M a .I' A a -u %0 0 i 2 r: .U .-. F21 S.3 D1r - 0 - 5 w 0 5 Tim inbeats s 10 Time n bets 7- i B3 .. aar D7 o BUS am16 m 0 al s it -Tims aInbeats dim-4, nn-S, p-16 - ume Ui dim-4, nn.6, p.64 05 U "E. r. aU .m -..- UUU r'-- -DS J a . U i car - o beats I brim - - 0 s aa SI " t Tie in ma . N ".0 dim-4, nn=-S, p-S dim.4, nn-S, p.4 G71- . *1g*~ a ss :s a s o r -a in as Timeinbeats -- -U-,- I -U U me, med '= ME~q d *Lmm g '~6 ME=~ m mge EuMqp - NEWgp% fe-my I 61 ME 'o. E ' mm a..V -at .1 * J" us . &s~ =d m/..'". if* lol. 'rfr* A--% .iL *IU m so IA 'I' C... AF*',td A If *so I ' *,'*'r'( 9 M-rM.P0I. -'RU -d 'E"u B--np *L9d'g-uu gwp %A-dw ' '.- I I ON o a,~ a! a m 0 In ma Smm, -S 0 I a m m ImaI on %Io -'1 mve~~~~o 'mm;,::No w a a N Am a tEm 6N *EIY * I *r e:M aiN a Ia oo aa s B onm ma6j 41 A Ia m'anmt ON Bookb E So -I on Imm IN . S U;. S- .Ld 'I -au 'a-nIp '&&-aa'a-nip .waq LA -4 ** ~ an mama 09mDsga * %a~~a *%~ ~ mm a -*a I afIN -mU w I~ 0S 0 "o aM 0 0 Is - oa SoI* pa-d aII-ms 'a-nip am.l a11.1 aaaq 3m3 Noaa!So 0 a a a ut N. r-d 'ttinas 'a-Ip - st 4.- 4 11-P 7i No a ma am m, a Oa r af Im% a Ea *w~ O a- 'a-ma ' aNi p No~ am w m - M O l o0 a -a'aa l '-i Ito a-d a-mnoag Iii U I s .41 ada 11 * - * I S l I * U.0* - a -onn a: U t I g-d 'gu - s=. * avI r .-*ap WN,on U a mat ME~~~~~ ~~ O&i a 1 1 lI- " g-uu 'a-wmp -d 's-uu '-tup . 1 .1 0j - d% N ~ 1*ojby.,* III~ WI U .j5 4r*iUc % g-d 1 - I - -N on'%'MS O IN -9 1 O. -- a I* . - . -1 '| au * .6 m IN * I * ....S- god I -1 ~ 'asu li I - *.. NO: -N - d' - -1 z-uu B-suip g-d 0 *1 -- 7 a |1 A.I . ON .' ' - .s . ~j IS .. p.U- * " t to~u - u' ' 'A - * *. .' --v .t ''" p-d w a==, *--" .-- * " u - -U- g-d'z-uu '-uip Or=d LUU 'mup ', -1 I * N a is p-d UIIE 1 : EU.. -16 a a is ps~d ot1Uu a Wp 'pL.UU'BW p 1111 EI - U ONaIS m N -o s - - a.- U. * U * * 'ual qu. EEU. *P * *£ ""U. I * -i -'JEu 3- 6-11 m ; . ? IIIIII6lf, olial I USE L -- ' 'u c *l Tm huIIW 'OL-uu 'emulgp U'I I~ . -U-Il, mt a - LIIII - S * gL-d * x1 - ll S 1 oi 95%if 9SVIIIIoI -* r l~d'guu -d 'gUWIp 'oL-uu 'leWip dim.8, nn=14, p.16 dim.6, nn=14,p-8 so. . . .. E dimn=, nn=14 p-64 F5 - E -- ! ES* F.IN . :il.E - ~ ~" c .. -.6: F4#4 m * I a * 10w Tim in beats - F i I . o i G- Tnb *g5 I w 10 so 5040 2o Tim in beats E- 2 dim10, nn=2, p= AS o S 10 15s 2 25 30 Tim in beat. 35 40 45 10 15 2 *2 30 5I 4 B.7 Combination of Etudes 4 and 6, Method 3 with Components Modeled Independently B.7 Combination of Etudes 4 and 6, Method 3 with Components Modeled 139 Independently dim=6, nn=2, p-4 dim=6, nn=2, p-B Tim in beats Tim in beats dimaS, nn=2, p=64 dim-6, nn=2, p-128 dim-6, nn=2, p-16 dim-6, nn=5, p.4 | - , . ~S:6-,"l--. 6 P| It - 11 as~ 1 --- -. - '. o il I1 - ~q'1 ~ ~ E4 I F3Il -1 11 4I -C S.,. . ~ 1. 2 ., II I , 1 VE T Nip Io I a soI Tim in beats - . 5. I.* F4 =n5, p-S dil.B, dim=6,nn=5, p-16 dlm-6, nn=5,p.64 * *. % 5 - -. -pr . I Ii - n s I I -= 10 20 ..Ir.a :.-.- .. ... - - I V nnl p.6 - Fel - .. Am IsTy I 1n 207q3 CalISf- i .. Tim in .J beats 5' 0 0 6IN S 't5its D5# 'I o i=6 . 'SIt -P.-jViPS I 30 Tim in beats Figure B-7: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 3 with components modeled independently. Each panel is a combination of the Etudes using different parameter values. 43#d 4ONd B.7 Combination of Etudes 4 and 6, Method 3 with Components Modeled 141 Independently dim-6, nn.11, p=64 D.-. dim-6, nn.11, p.128 dim.8, nn=2, p.4 , A70- C7-- I A5 I. - aa 10 2M Tim in b osI.0 - e Iss all 0 w4 3 as - A 1 . - 10 2 w 3 Q w 4 5 G6O G3 . Tim in beats I" "sa dm=B, 88.2, p.64 -n - 2m I3 "i- 0 10 2w3 E4 F mM. - FI.N Se owsI Timie in Time .. in beats 0 0 s asse ii sis bas dim=8, nn=2, p=128 dim-B, nn-6, p-4 dim-, nn., E4 p. a- t ""I o m I. s esi -B- m 1 '-wilmwns, I aa. 0 to 2 - - - 1% s 1 0 E- Is I IN I 0 a.o Timeinbeats 'Al s 7,0 o 10 2M 30 Tim in beats 4 I o I. dim-, nn.6, p-128 dim-B, nn=6, p-16 is ~. " I Ii As -.. ...1. - -A is = sa. Is a.'.. l .... Fu .. %, .R 10 .. 0 .. I 2 Tirne dima, nin10, p-4 mu -I, ~~~ ~ E4 w ,3 IS 1 10 f 1o . * FS . - 8 E4s B ifI I mI II1I1m : 1A J~, is e a u ca.Timeinbeats dim-8, nn=10), AS p.64 5- E , . ES Al~ Ns.al. as ai .. - G1 . 1 . si 8 - F2 80 in SO 70 beats dim-, nn-10, p-16 dim-B, nn-10, p-S 5 ~ I~~ o "~ gi '. I. IN No I Il-i . . -20mill4 11 "1 1 GS %. as m Timeinbeats ii .... sn-In, pa d iuim-B, mess . - a* EE,. .l ES co . 55.5 5** -6" ,' i SO 1o dim-B, nn-14, p-16 or a IN 'I. Is a e a INa ases IN'i , ." aI dim-B, nn-14, p.64 n I, , M Npse ll a* I . mll uil lip I * e a m gissa s'- 0 10 2M o ) beats 4 Time in wo Mo SO 7 to so 2s Tim in beats 7 dim-, nn=14, p.128 dim-10, nn-2, p-B MI.. .. PM -* -6 1 F5 - a ~ ~ IN -t. INIIN I D4~~~ Ne !4'i'Ns is N I I - E D4 - - - .INmp . Faa - Car allD1 ca3 - 0 EON 1. 0 0 Time in a s o o 7 0 10 2w beats s m9,m 'IN O Tim in beats dim-10, nn-2, p-16 N 3Ga dimi10, nn-2, p-64 IN I. klf~.. - .&.9NNNN .l : . IN ,..H , m, Als -, Is II , - m. ,,I. INe 3N * N' i 24'7 N. I-i1N itn 1#-. d nP IF p 'I= ',,i '.I:t I IN . N De' ~ w dim-10, nn=7, p-16 dim-10, nn-7, p-8 dim-10, nn=7, p-4 C7 -7 I 'rd t I AAS . ... - n , .. i- . " - 9 %~ . ,9. I J .* a .fal- C n. E5B e5 - e ...a- == d . .. . %5 --i .. F3 Tion .4 so.7 x 10 c4 .. ' os~~Thn I -. ~ ~ Cal- .16: I .be.a"..'. A1 - CS' El 1,41 dim-I0, nn.7, p.64B - - -- 40 Tans in beats 30 2 50 0 70 dim-10, nn=12, p-4 dim.10, nn=7, p.1 28 .. -, cdim-,"o-12 * ... . 10 G7M - be...s FAan bootC 11.6. . - I7 Xs D. Is I .. .Tin. a I e . Bil E5 % -:-aa-Mal ca M . - - - -.. l, . -.--. B7 - a as . f 10 30 20 - . 50 40 O 0 10 M 20 30 . s50 o 4 70 Tim In beats Tim in beats e.. I, a. -U A1l - .as F4 Ii .s Fal C72 E - .. i. dm- -l., nn1 . p- . dim-10, dim=10, nn=12, p-16 nn=12, p=64 :7# -a.. A7 - -5,-10 -n2 "...a.t, -:74 E7 F5 - .O . F5 P. m F6U. . a DOu GSB0 .. . .. " - . 4. .. C4 D3 al -. " "" 0 Tim in beats s ,... s " 10SO -- - 7 Tim in bead. 6 " 0 0 10 2w 30 SO50 6 dim-10, nn=17, p-4 dim-10,nn=17, p=8 Fa- G7- * 1%II I~ ** 1 Ism *i1 ceaF4 03 I -o I Is W-F R - 0 1. - 0 T..mi.. 4 0 Tim in beats 3 dim-10, nn=17, p.16 6 0 10 2M30 4 70 Tim in beats dim-10, nn=17,p.64 dim=.10,nn=17, p"128 G7# C7 uIn-s . . .'..., . * A6- I,.*. - F6 e .asas ma.e Is I c E5 - s Ga4 - C- AI3 - - Z F2-a 0 10 20 30 40 Tim in beats 5 60 7 o 10 E5 dim.12, nn=2, p=4 - .....203w0 .. .. Tim .~in beats 4 5 0 - 17 - F1 PV.. nn=2, p.8 dims-12, F4# F4 9 05, Iss ... . se Fi3 m.. . - s 3 -M 0 10 20 1 D5 - DW- ce....l-. ~ ' vpi9 I 3 40 5 Tim in beats 60 7 -o -6 - dim-12, nn.4, p-4 dim.12, nn-2, p-128 'p F. - -\ G3 E7 Fs,- F4r - -i V. c --- E2 A2 - - . ".. F1 . c1 10 0 Tim inbeats 50 0 c4 dim-12, nn., dim.12, nn-B, p.B ~ ~ 10 0 II . - D10 . -o M 2w w 5 40 Tirninbeats 3.. II. -. . .o- .- .- i il... l.. s 03- p-16 *: -- Ti4 inbeats 10 20 30 4B Tim in beats 5 6 70 '|" "". """ 0 I ss F3 - FS dim-12, nn.8, p-128 - FC G1Be C3 e D1. N 0 06 9I - dim12, nn14, p.6 - I - F53 DOc3 . os E1lg e -A 5i ibet E1 F.r asc0,.. o 10 20 40 3 Tim in beats 50 6 U, '0 4 dim-.12,nn-14, p.16 8 dim=12,nn=14, p-4 dim.12, nn=14, p-12 EB Fu~~F - ... C2 . C. .- 7-.. ...- -- s F5 F5-. c5 - 4 - - . o - E3 .. G4m -n . .L "et F3 -et -i Ti 1. -b A1-- -- "- . . - w 2 in ts ~e ... Appendix THREE Experiment Forms 149 Experiment 2 Part 1 For each fragment answer the following questions: 1. How complex Is the fragment? 2, How much do youlik this fragment? 3. What group does the figmenth elonto? After listening to al the fragments, cluster them into amaximym of five groups based on SIMILARITY. If you feel the fragments can only be grouped in two, use only numbers Iand 2; if you feel they can only be grouped in three, use numbers 1,2 and 3; and so on For both complexity and preference we use a scale from 0 to 6, 0 being the least complex and least preferred, 6 being the most complex and most preferred. Please listen to all the musical fragments before answering the questions. You might want to use pen and paper to take notes ofthe groupings you may come up with before putting your answers in the form. complexity [preferenceI group [F2] 3 tF5] L-D f-4 Fc fA il ,L3 ,D4,t-LD.Fcij FIG 4 How familiarized are you with the music you just heard? never heard any of it before sounds familiar iknow i've heard it before, but don't know what it is C iknow exactly what one (some) of them is(are) O (Submit 150 150 give name(s) Answers Experiment Forms Experiment Forms IExperiment 2 Part 2 Rate the SIMILARITY between each of the fragments [Fl], [F2], .., [F10] making up the rows of the following table, and each of the two pieces labeled [A] and [] making up the columns; 0 being minimum similarity, 6 being maximun similarity. Please listen to the comparison pieces [A] and [B] carefully before you begin, and feel free to go back and listen to them as many times as you want. -F3 [F1]D F-7 [F3] [F4] [F6] [F8] How similar are [A] and [B]? 151 152 Experiment Forms Bibliography [1] V. Adan. Affine and stochastic transformations in "rotaciones". 2003. [2] A. Aguilera and R. Pdrez-Aguila. General n-dimensional rotations. In R. Scopigno and V. Skala, editors, Journal of WSCG, pages 1-8. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Science Press, February 2004. [3] G. Assayag and S. Dubnov. Using factor oracles for machine improvisation. Soft Computing, pages 1-7, 2004. [4] F. P. Brooks, A. L. Hopkins, P. G. Neumann, and W. V. Wright. An experiment in musical composition. In S. Schwanauer and D. Levitt, editors, Machine Models of Music. MIT Press, 1993. [5] L. Cao. Practical method for determining the minimum embedding dimension of a scalar time series. Physica D, 1997. [6] C. Chatfield. The Analysis of Time Series: An Introduction. Text in Statistical Science Series. Chapman and Hall/CRC Press, 2004. [7] D. Conklin and I. Witten. Multiple viewpoint systems for music prediction. In Journal of New Music Research. Swets and Zeitlinger, 1995. [8] D. Cope. On algorithmic representation of musical style. In M. Balaban, K. Ebcioglu, and 0. Laske, editors, UnderstandingMusic with AI: Perspectives on music cognition. The AAAI Press/MIT Press, 1992. [9] D. Cope. A computer model of music composition. In S. Schwanauer and D. Levitt, editors, Machine Models of Music. MIT Press, 1993. [10] J. De Bonet and P. Viola. A non-parametric multi-scale statistical model for natural images. Advances in Neural Information Processing, 10, 1997. Bibliography 153 [11] M. Dirst and A. Weigend. Baroque forecasting: On completing j. s. bach's last fugue. In A. Weigend and N. Gershenfeld, editors, Time Series Prediction: Forecasting the Future and Understanding the Past, volume XV of Santa Fe Institute Studies in the Sciences of Complexity, pages 151-172. Addison-Wesley, 1993. [12] K. Ebcioglu. An expert system for harmonizing chorales in the style of j.s.bach. In 0. Laske and M. Balaban, editors, UnderstandingMusic with AI: Perspectives on music cognition. The AAAI Press/MIT Press, 1992. [13] T. Eerola and P. Toiviainen. MIDI Toolbox: MATLAB Tools for Music Research. Department of Music, University of Jyviskyli, Finland, 2004. [14] J. Estrada. Thiorie de la composition : discontinuum continuum. PhD thesis, l'Universite de Strasbourg II, Sciences Humaines, France, 1994. [15] J. Estrada. El imaginario profundo frente a la mdisica como lenguaje. XXI Coloquio Internacional de Historia del Arte: La abolicicin del Arte, 2000. [16] J. Estrada. Focusing on freedom and movement in music: Methods of transcription inside a continuum of rhythm and sound. In J. R. Benjamin Boretz, Robert Morris, editor, Perspectives of New Music, volume 40, pages 70-91. Perspectives of New Music, 2002. [17] A. Forte. The Structure of Atonal Music. Yale University press, 1973. [18] N. Gershenfeld. The Nature of Mathematical Modeling. Cambridge University Press, 1999. [19] S. Gill. A technique for composition of music in a computer. In S. Schwanauer and D. Levitt, editors, Machine Models of Music. MIT Press, 1993. [20] A. Graps. An introduction to wavelets. IEEE Computational Science and Engineering,2(2), 1995. [21] L. Hiller and L. Isaacson. Musical composition with a high-speed digital computer. In S. Schwanauer and D. Levitt, editors, Machine Models of Music. MIT Press, 1993. [22] H. Kantz and T. Schreiber. Nonlinear Time Series Analysis. Cambridge University Press, 2004. 154 Bibliography [23] M. B. Kennel, R. Brown, and H. D. I. Abarbanel. Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physics Review, 1992. [24] D. Kugiumtzis, B. Lillekjendlie, and N. Christophersen. Chaotic time series part i: Estimation of some invariant properties in state space. Modeling, Identification and Control, 15(4):205-224, 1994. [25] B. Lillekjendlie, D. Kugiumtzis, and N. Christophersen. Chaotic time series part ii: System identification and predition. Modeling, Identification and Control, 15(4):225-243, 1994. [26] D. Loy. Composing with computers: A survey of some compositional formalisms and music programming languages. In M. Mathews and J. Pierce, editors, Current Directions in Computer Music Research. MIT Press, 1989. [27] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998. [28] E. R. Miranda. Composing music with computers. Music Technology Series. Focal Press, 2001. [29] M. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. In N. Griffith and P. Todd, editors, Musical Networks: parallel distributed perception and performance. MIT Press, 1999. [30] F. Pachet. The continuator: Musical interaction with style. In ICMA, editor, Proceedings of ICMC, pages 211-218. ICMA, 2002. [31] T. Pankhurst. Schenker guide. [32] G. Rader. A method of composing simple traditional music by computer. In S. Schwanauer and D. Levitt, editors, Machine Models of Music. MIT Press, 1993. [33] J. Rissanen. Stochastic complexity and modelling., 1986. [34] R. Rowe. Machine Musicianship. MIT Press, 2001. [35] R. Shepard. Pitch perception and measurement. In P. Cook, editor, Music, Cognition and Computerized Sound. MIT Press, 1999. [36] J. Shlens. A tutorial on principal component analysis. March 2003. Bibliography 155 [37] B. Shoner. State reconstruction for determining predictability in driven nonlinear acoustical systems. Master's thesis, MIT, May 1996. [38] H. Simon and R. Sumner. Pattern in music. In S. M. Schwanauer and D. A. Levitt, editors, Machine Models of Music. MIT Press, 1993. [39] L. Smith. The maintainance of uncertainty. [40] F. Takens. Detecting strange attractors in turbulence. In Lecture Notes in Mathematica. Springer, 1981. [41] J. Trevino-Rodriguez and R. Morales-Bueno. Using multiattribute preduction suffix graphs to predict and generate music. In Computer Music Journal,number 25:3, pages 62-79. MIT Press, 2001. [42] B. Vercoe. Understanding csound's spectral data types. R. Boulanger, editor, The Csound Book. MIT Press, 2000. In [43] I. Xenakis. Formalized Music. Pendragon Press, 1992. 156 156 Bibliography Bibliography