Hierarchical Music Structure Analysis,

advertisement
Hierarchical Music Structure Analysis,
Modeling and Resynthesis: A Dynamical
Systems and Signal Processing Approach.
by
Victor Gabriel Adin
B.M., Universidad Nacional Aut6noma de Mexico (2002)
Submitted to the Program in Media Arts and Sciences,
School of Architecture and Planning,
in partial fulfillment of the requirements for the degree of
Master of Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September, 2005
@ Massachusetts
Institute of Technology 2005. All rights reserved.
Author ...................................................
Program in Media Arts and Sciences
August 5, 2005
Certified by .............
Barry L. Vercoe
Professor of Media Arts and Sciences
Thesis Supervisor
A ccepted by...........
................
Andrew B. Lippman
Chairman, Departmental Committee on Graduate Students
MASSACHUSETTS INSTTUTE
OF TECHNOLOGY
SEP 2 6 2005
UBRARIES
IROTCH
2
Hierarchical Music Structure Analysis, Modeling
and Resynthesis: A Dynamical Systems and
Signal Processing Approach.
by Victor Gabriel AdAn
Submitted to the Program in Media Arts and Sciences,
School of Architecture and Planning, on August 5, 2005,
in partial fulfillment of the requirements
for the degree of Master of Science
Abstract
The problem of creating generative music systems has been approached
in different ways, each guided by different goals, aesthetics, beliefs and
biases. These generative systems can be divided into two categories:
the first is an ad hoc definition of the generative algorithms, the second is based on the idea of modeling and generalizing from preexistent
music for the subsequent generation of new pieces. Most inductive models developed in the past have been probabilistic, while the majority of
the deductive approaches have been rule based, some of them with very
strong assumptions about music. In addition, almost all models have
been discrete, most probably influenced by the discontinuous nature of
traditional music notation.
We approach the problem of inductive modeling of high level musical
structures from a dynamical systems and signal processing perspective,
focusing on motion per se independently of particular musical systems
or styles. The point of departure is the construction of a state space
that represents geometrically the motion characteristics of music. We
address ways in which this state space can be modeled deterministically,
as well as ways in which it can be transformed to generate new musical
structures. Thus, in contrast to previous approaches to inductive music
structure modeling, our models are continuous and mainly deterministic.
We also address the problem of extracting a hierarchical representation
of music from the state space and how a hierarchical decomposition can
become a second source of generalization.
Thesis supervisor: Barry L. Vercoe, D.M.A.
Title: Professor of Media Arts and Sciences
Abstract
3
Thesis Committee
Thesis supervisor ..........
Barry L. Vercoe
Professor of Media Arts and Sciences
Massachusetts Institute of Technology
Thesis reader .....................................................................
Rosalind Picard
Professor of Media Arts and Sciences
Massachusetts Institute of Technology
Thesis reader.
Thesis Committee
.............
Robert Rowe
Professor
New York University
5
6
Acknowledgments
I would
me into
creative
his kind
first of all like to thank my advisor, Barry Vercoe, for taking
his research group and providing me with a space of complete
freedom. None of this work would have been possible without
support.
Many thanks to the people who took the time to read the drafts of
this work and gave me invaluable criticisms, particularly my readers:
Professors Rosalind Picard, Robert Rowe and again Barry Vercoe.
Many special thanks go to the other Music, Mind and Machine group
members: Judy Brown, Wei Chai, John Harrison, Nyssim Lefford, and
Brian Whitman for being such incredible sources of inspiration and
knowledge.
Special thanks to Wei for her constant willingness to listen to my math
questions and for her patience. To Brian for being my second "advisor". His casual, almost accidental lessons gave me the widest map of
the music-machine learning territory I could have wished for. Thanks to
John for being a tremendous "officemate" and friend. His intelligence,
skepticism and passion for music and technology made him a real pleasure to work with and a constant source or inspiration.
Many other Media Lab members deserve special mention as well: Thanks
to Hugo Solis for helping me with my transition into the lab, doing
everything from paper work to housing. I probably wouldn't have come
here without his help. Also thanks to all 45 Banks people for being such
wonderful friends and roommates: Tristan Jehan, Luke Ouko, Carla
Gomez and again, Hugo.
In addition I have to thank all the authors cited in this text. This thesis
wouldn't have been without their work. I want to give special thanks
to Bernd Schoner for his beautiful MIT theses. They were a principal
source of inspiration and insight for the present work.
Acknowledgments
My greatest gratitude to my family. Their examples of perseverance and
faith dfsfdhave been my wings and strength.
Last but certainly not least, I want to thank my dear wife "Cosi" for
her love and support (and her editing skills). She is the most important
rediscovery I have madfde in my search for the meaning of it all.
8
Acknowledgments
Table of Contents
11
1 Introduction and Background
1.1
1.2
1.3
.
.
.
.
11
12
13
14
Our Approach to Music Structure Modeling . . . .
18
M otivations . . . . . . . . . . . . . . . . . . . . . . . . .
Generative Music Systems and Algorithmic Composition
M odeling . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Models of Music . . . . . . . . . . . . . . . . . .
1.3.2
2 Musical Representation
21
3 State Space Reconstruction and Modeling
25
3.1
3.2
State Spaces . . . . . . . . . . . . . . . . . . . . . . . . . .
Dynamical Systems . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Deterministic Dynamical Systems . . . . . . . . . .
3.2.2 State Space Reconstruction . . . . . . . . . . . . .
3.2.3 Calculating the Appropriate Embedding Dimension
25
26
27
28
29
Calculating the Time Lag r . . . . . . . . . . . . .
31
3.2.4
3.3
M odeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Spatial Interpolation = Temporal Extrapolation . 33
43
4 Hierarchical Signal Decomposition
4.1
4.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 43
. 44
. 44
. 45
. 49
. 54
. 55
Structure Modeling and Resynthesis . .
Interpolation . . . . . . . . . . . . . . .
Abstracting Dynamics from States . . .
Decompositions . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Divide and Conquer . . . . . . . . . . . . . .
4.1.1 Fourier Transform . . . . . . . . . . .
4.1.2 Short Time Fourier Transform (STFT)
4.1.3 Wavelet Transform . . . . . . . . . . .
4.1.4 Data Specific Basis Functions . . . . .
4.1.5 Principal Component Analysis (PCA)
Sum m ary . . . . . . . . . . . . . . . . . . . .
57
5 Musical Applications
5.1
Music
5.1.1
5.1.2
5.1.3
Contents
58
59
62
66
9
5.2
5.3
Outside-Time Structures . . . . . . . . . . . . . . . . . . .
5.2.1 Sieve Estimation and Fitting . . . . . . . . . . . .
Musical Chimaeras: Combining Multiple Spaces . . . . . .
6 Experiments and Conclusions
6.1
6.2
Experim ents . . . . . . . . . . . . . . .
6.1.1 Merging Two Pieces . . . . . .
Conclusions and Future Work . . . . .
6.2.1 System's Strengths, Limitations
finements . . . . . . . . . . . .
79
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
and Possible Re. . . . . . . . . .
. 79
. 79
. 91
.
92
Appendix A
Notation
95
Appendix B
Pianorolls
97
B .1
B .2
B.3
B.4
B.5
B.6
Etude 4 . . . . . . . . . . . . . . . . . . . . . .
Etude 6 . . . . . . . . . . . . . . . . . . . . . .
Interpolation: Etude 4 . . . . . . . . . . . . . .
Combination of Etudes 4 and 6, Method 1 . . .
Combination of Etudes 4 and 6, Method 2 . . .
Combination of Etudes 4 and 6, Method 3 with
nents Modeled Jointly . . . . . . . . . . . . . .
B.7 Combination of Etudes 4 and 6, Method 3 with
nents Modeled Independently . . . . . . . . . .
Appendix C
Bibliography
10
69
69
70
Experiment Forms
. . . . . . 97
. . . . . . 99
. . . . . . 101
. . . . . . 109
. . . . . . 119
Compo. . . . . . 129
Compo. . . . . . 139
149
153
Contents
CHAPTER ONE
Introduction and Background
1.1
Motivations
There are two motivations for the present work: the first motivation
comes from my interest in music analysis. Several music theories have
been developed as useful tools for analyzing and characterizing music.
Most of these theories, such as Riemann's theory of tonal music, Schenkerian analysis [31] and Forte's atonal theory, are specific to particular
styles or systems of composition. It would be interesting to see the development of an analytical method useful for any kind of music. This
method would have to be based on features that are present in all music,
the element of motion being the only constant characteristic. Thus, my
interest in finding a more general method of musical analysis applicable
for a variety of music has motivated me to classify music in terms of
the different kinds of motion it manifests, encouraging me to define a
taxonomy of motion. While the importance of motion in music is obvious, no work that I know of has systematically studied music from this
perspective.
The second motivation comes from my interest in understanding the behavior of my musical imagination. The process of composing usually
includes writing or recording the imagined sound evolutions as they are
being heard in one's mind. Multiple paths and combinations are explored during the process of imagining new music, so that the score we
ultimately write is but an instance of the multiple combinations explored
in one's mind. It is also a simplification of the imagined music because
the representation we use to record the imagined sound evolutions is
incomplete. In other words, we are unable to represent accurately every detail of the imagined universe. Thus, this representation is like a
photograph of the dynamic, ever-changing world of the imaginary. Why
decide on one sequence or combination over another? Is there a single
best sequence, a better architecture? My personal answer to this question is no. This has led me to become interested in pursuing the creation
of a meta-music: a system that generates the fixed notated music along
with all its other implied possibilities.
1.2
Generative Music Systems and Algorithmic Composition
Algorithmic composition can be broadly defined as the creation of algorithms, or automata in general, designed for the automatic generation of
music. 1 In other words, in algorithmic composition the composer does
not decide on the musical events directly but creates an intermediary
that will choose the specific musical events for him. This intermediary
can be a mechanical automaton or a mathematical description that constrains the possible musical events that can be generated. The definition
here is deliberately broad to suggest the continuous spectrum that exists
in the levels of detachment between what the composer creates and the
actual sounds produced. 2
Every algorithmic composition approach can be placed in a continuum
between ad hoc design and music modeling. Ad hoc designs are those
where the composer invents or borrows algorithms with no particular
music in mind. The composer doesn't necessarily have a mental image
of what the musical output will be. The approach is something like:
"Here's an algorithm, and I wonder what this would sound like." In
music modeling, the composer deliberately attempts to generalize the
music he hears in his mind, so the algorithm is a deliberate codification
of some existing music.
We find examples of ad hoc approaches as early as 1029 in music theorist
Guido D'Arezzo's Micrologus. Guido discusses a method for automatically composing melodies using any text by assigning each note in the
pitch scale to a vowel [26]. Because there are more pitches than vowels,
the composer is still free to choose between the multiple pitch options
available. Similarly, Miranda borrows preexisting algorithms from cellular automata and fractal theory for automatic music generation [28].
'For a complete definition of algorithm and a discussion of how they relate to music
composition, see [26].
2
One could argue that any composition is algorithmic since the composer does not
define the specific wave-pressure changes, only the mechanisms to produce them.
12
Introduction and Background
While inventing ad hoc algorithms for music composition is a fascinating
endeavor, in this work we are interested in learning about our intuitive
musical creativity and developing a generative system that grows from
musical examples. Thus, we discuss the design of a generative music
system based on modeling existent music.
1.3
Modeling
All models are wrong, but some are useful.
George E.P. Box
There are no best models per se. The most effective model will depend
on the application and goal. Different goals suggest different approaches
to modeling. As expressed by our motivations, our models intend to
serve a double purpose: the first is to obtain some understanding about
the inner workings of a given piece and, hopefully, gain insight into
the composer's mind. The second is for the models to be a powerful
composition tool. It is difficult, if not impossible, to come up with a
model that achieves these two goals simultaneously for a variety of pieces
because a generative model might not be the most adequate for analysis
and vise versa. Music structure is so varied, so diverse, that it seems
unlikely that a single modeling approach could be used successfully for
all music and for all purposes.
What are the criteria for choosing a model? Is the model simple? The
Minimum Description Length principle [33], which essentially defines
the best model as that which is smallest with regards to both form and
parameter values, is a measure of such a criterion. Other criteria to
consider are:
Robustness: Does it lend itself well to a variety of data (e.g. musical
pieces)?
Prediction: Can it accurately predict short term or long term events?
Insight: Does it provide new meaningful information about the data?
Flexibility: As a generative system, what is the range or variety of new
data that the model can generate?
1.3 Modeling
13
1.3.1
Models of Music
Deduction vs. Induction
Brooks et al. describe two contrasting approaches to machine modeling:
the inductive and the deductive [4]. Essentially, the difference lies in who
performs the analysis and the generalization: the human programmer or
the machine. In a deductive model, we analyze a piece of music and
draw some rules and generalizations from it. We then code the rules
and generalizations and the machine deduces the details to generate new
examples. In an inductive approach, the machine does the generalization.
Given a piece (or set of pieces) of music, the machine analyzes and learns
from the example(s) to later generate novel pieces.
While many pieces may share common features, each piece of music has
its own particular structure and "logic". A deductive approach implies
that one must derive the general constants and particularities of a piece
or set of pieces for the subsequent induction by the machine. This is a
time-consuming task that could only be done for a small set of pieces
before one's life ended. The classic music analysis paradigm is at the root
of this approach. One can certainly learn a lot about music in this way,
but it seems to us that attempting to have the machine automatically
derive the structure and the generalization is not only a more interesting
and challenging problem, it also might shed light about the way we learn
and about human cognition in general. This approach also encourages
one to have a more general view regarding music and to be as unbiased as
possible (hopefully changing our own views and biases in the process).
In a deductive approach we are filtering the data. We are telling the
machine how to think about music and how to process it. In an inductive
approach the attempt is to have the machine figure out what music is.
We could alternatively attempt to model our own creativity directly, but
it seems to us that the "logic" or structure and generative complexity of
the highly subconscious and hardly predictable creative mind is at a far
reach from our conscious self probing and introspection. 3 Rather than
asking ourselves what might be going on in our mind while we imagine a
new piece of music and trying to formalize the creative process, we can
let our imagination free, without probing, and then have the machine
analyze and model the created object.
3
This nebulous and partly irrational experience of the imaginary has been expressed
many times by different composers, for example [34, 15].
14
Introduction and Background
Continuous vs. Discontinuous
Most generative music systems and analysis methods assume a discrete
4
(and most times finite) musical space. This is a natural assumption
since the dominating musical components in western music have been
represented with symbols for discrete values. 5 The implication is that
most models of music depart from the idea that a piece is a sequence of
symbols taken from a finite alphabet. Almost all the generative systems
we are aware of are discrete ([21] [4] [19] [38][12] [8][9][41] [7][32] [30] [29][3]).
This thesis, however, proposes a continuous approach to modeling music
structure.
Deterministic vs. Probabilistic
There is no way of proving the correctness of the position of
'determinism' or 'indeterminism'. Only if science were
complete or demonstrably impossible could we decide such
questions.
Mach (Knowledge and Error, Chapt XVI.11)
Should a music structure model be deterministic or stochastic? Consider two contrasting musical examples: on one extreme there is Steve
Reich's Piano Phase. The whole piece consists of two identical periodic
patterns with slightly different tempi (or frequencies).6 Piano Phase can
be straightforwardly understood as a simple linear deterministic stationary system. On the other extreme we can place Xenakis' string quartet
ST/4, 1-080262. It would make sense to model ST/4, 1-080262 stochastically since we know it was composed in this way! In between these two
extremes there are a huge variety of compositions with much more complex and intricate structures. There may be pieces that start with clearly
perceivable repeating patterns and that gradually evolve into something
apparently disorganized. A piece like this might best be modeled as a
combination of deterministic and stochastic components that are a func4
Since the classical period, notation for loudness has commonly included symbols
for continuous transitions, but loudness is typically not considered important enough
to be studied. If it is, its continuous nature is typically ignored.
5
Indeed, the recurrent inclusion of the continuum in notated pitch space (starting probably with Bartok, continuing with Varese and Xenakis, and culminating in
Estrada) has made it difficult or impossible for some music theoretic views to approach
these kinds of music.
6
The point of the piece is the perception of the evolution of the changing phase
relationships between the two patterns.
1.3 Modeling
15
tion of time. In addition, these combinations may occur at multiple
time scales, in which case it would be better modeled as a combination
of a deterministic component at one level, and a stochastic component
at another. Thus, rather than trying to find a single global model for a
whole piece, we might want to model a piece as a collection of multiple,
possibly different, models.
Brief Thoughts on Markov Models
It is intriguing to see how the great majority of the inductive machine
models of music are discrete Markov models. Remember that an kth
order Markov model of a time series is a probabilistic model where the
probability of a value at time step n + 1 in a sequence s is given by
the k previous values: p(sn+1IP(sn,Sn-1, ..
, Sn-k+1).
Why discrete and
why Markovian? Several papers on Markov models of music are based
on the problem of reducing the size and complexity of the conditional
probability tables by the use of trees and variable orders([41] [3] [30]).
Again, the discrete nature of the models most probably comes from the
view of music as a sequence of discrete symbols taken from an alphabet.
Why not model the probability density functions parametrically, as with
mixtures of gaussians?
Are Markov models really good inductive models of music? What kind
of generalization can they make? Most applications of discrete Markov
models estimate the probability functions from the training data. Usually, zero probabilities are assigned to unobserved sequences. If this is
the case, it is impossible for a simple kth order Markov model to generate any new sequences of length k + 1. All new and original sequences
will have to be of length k +2 or greater. Take for example the following
simple sequence which we assume to be infinite:
1, 2,3,2,1,2,3, 2,1,2,3,2,...
(1.1)
A first order model of this sequence can be constructed statistically by
counting the relative frequencies of each value and of each pair of values.
The relative frequencies of all possible pairs derived from this sequence
can be easily visualized in the following table, where each fraction in the
table represents the number of times a row value is immediately followed
by a column value, divided by the total number of consecutive value or
sample pairs found in the sequence. From these statistics we can now
derive the marginal probabilities of individual values, and by Bayes' rule
16
Introduction and Background
Table 1.1: Relative frequency of all pairs of values found in sequence 1.1
1
1 0
2 1/4
0
3
2
1/4
0
1/4
3
0
1/4
0
we find the conditional probabilities:
P(Sn+1ls
)
= P(Sn+1, Sn)
p(Sn)
From this table we can see that new sequences generated by this joint
probability function will never produce pairs (1,1), (1,3), (3,1), (3,3),
(2,2) or extrapolate to other values, such as (4,5). In other words, all
length 2 sequences that this model can generate are strictly subsets of
sequence 1.1, while those of length 3 or greater may or may not appear in
the original sequence. To overcome this limitation one could give unobserved sequences a probability greater than zero, but this is essentially
adding noise. Looking at generated sequence segments of length 3 or
greater we may now wonder how these are related to the original training pieces. The limitation of this model soon becomes apparent. New
sequences generated from this model will have some structural resemblance to the original only at lengths 1 < k + 1 (1 < 2 in this example),
but not at greater lengths. From the first order model in the present
example it is possible to obtain the following sequence:
1, 2,3,2, 3, 2,1,2,3,2,1,2,1,2,1,2,1,2,3,2,3,2, 1, 2,3,2,1, ...
Because p(3 12 ) = p(112 ) = 0.5, the generated sequence will fluctuate
between 1 and 3 with equal probabilities every time a 2 appears. This
misses what seems to be the essential quality of the sequence: its periodicity. Increasing the order of the model to 2, we are able to capture
the periodicity unambiguously. But what generalization exists? Now the
model will generate the whole original training sequence exactly. This
is the key idea we explore in this work. We want the model to be able
to abstract the periodicity and generate new sequences that are different from the original but that have the same essentially periodic quality.
This information can be obtained by looking for structure and patterns
in the probability functions derived from the statistics, but in this work
we will present a different approach.
1.3 Modeling
17
One more problem is that of stationarity. In our example the sequence
repeats indefinitely, making a static probabilistic model adequate. But
music continuously evolves and is very seldomly static. Sequences generated with a simple model like the one discussed above may resemble the
original at short time scales, but the large scale structures will be lost.
This problem could be addressed by dynamically changing the probability functions, i.e. with a stochastic model. Modeling non-stationary
signals can also be done through a hierarchical representation, where the
conditional probabilities are estimated not only sequentially, but also vertically across hierarchies. Some concrete applications of this approach
can be found in image processing [10], and to our knowledge, there have
not been similar approaches in music modeling. Similar problems have
been observed using other methods such as neural networks [29], where
the output sequences resembled the original training sequence only locally due to the note-to-note approach to modeling.
The Markov model example we have given here is certainly extremely
simplistic, but hopefully it makes clear some of the problems of this
approach to music modeling for the purpose of generating novel pieces.
1.3.2
Our Approach to Music Structure Modeling
As suggested in our motivations, our main interest is the abstraction
of the essential qualities of the dynamic properties of music and their
use as sources for the generation of novel pieces. By dynamics we mean
the qualitative aspects of motion in music, rather than the loudness
component as is usually used by musicians. Here we approach musical
sequences in a similar way as a physicists would approach the motion of
physical objects.
The notion of the dynamical properties of a musical sequences is rather
abstract, and we hope to clarify it with a concrete example. First, consider the following question: why can we recognize a tune even when
replacing its pitch scale (e.g. changing from major to minor), or when
the pitch sequence is inverted? What are the constants that are preserved that allow us to recognize the tune?
Take for example the first 64 notes (8 measures) of Bach's Prelude from
his Cello Suite no.1 (Figure 1-1). There are multiple elements that we
can identify in the series: the pitches, the durations, the scales and harmonies the pitches imply, the pitch intervals and the temporal relationships between these elements. How can we characterize these temporal
18
Introduction and Background
Prelude.
Is -
Is
M
AO,"-.
-0
3
PAP~~~F
A
A
f fA
FS
7
60
58
56
54
52
50
48
46
44
NoteNumber
Figure 1-1: Top: First 8 measures of Bach's Prelude from Cello Suite
no.1. Bottom: Alternative notation of the same 8 measures of Bach's
Prelude. Here the note sequence is plotted as connected dots to make
the contour and the cyclic structure of the sequence more apparent.
1.3 Modeling
19
relationships? Can we say something about the regularity/irregularity
of the sequence, the changes of velocity, the contour? An abstract representation of the motion in Bach's Prelude might look something like
(up, up, down, up, down, up, down, down, ... ) repeating several times.
This is a useful but coarse approximation, preserving only the ordering
of intervals. We might also want to preserve some information about the
size of the intervals, their durations and the quality of the motion from
one point to the next: is it continuous, is it discrete? Whatever the case,
we see from Figure 1-1 (Bottom) that this general sequence is the only
structure in the Prelude fragment, repeating and transforming gradually
to match some harmonic sieve that changes every two measures. Every
two measures the sequence is slightly different, but in essence the type of
motion of the eight measures is the same. Thus, we can consider the sequence as being composed of two independent structures: the dynamics
and the sieves though which these are filtered or quantized.
In this thesis we focus on the analysis and modeling of these dynamic
properties. We address ways in which we can decompose, transform,
represent and generalize the dynamics for the purpose of generating new
ones. Rather than trying to extend on the Markovian model as suggested
above, we approach the problem from a deterministic perspective. We
explore the use of a method that allows for a geometric representation
of time series, and discuss different ways to model and generalize the
geometry deterministically.
20
Introduction and Background
CHAPTER TWO
Musical Representation
Structure in music spans about seven orders of magnitude, from approxto about 1, 000 seconds (~ 17 min.) [11].
1
imately 0.0001 seconds ( Oz)
the upper four orders (between 0.1 secs.
on
we
focus
In the present work
and 1,000 secs.) for two reasons: One, these orders correspond to those
typically represented in musical scores. Second, there is a clear difference between the way we perceive sound below and above approximately
0.02Hz.
Below this threshold, we hear pitch and timbre, and above it we hear
rhythm and sound events or streams. The perception can be so clearly
different that they feel like two totally independent things: a high-level
control signal driving high-speed pressure changes. These high-level control signals are what we are interested in modeling and transforming.
Thus, the musical representations we will use are not representations of
the actual sound, but abstract representations of these high-level signals.
As of today, it is still very difficult to extract these high level control
signals from the actual audio signal. Good progress has been made in
tracking the pitch of individual monophonic instruments and in some
special cases from polyphonic textures. But the technology is still far
from achieving the audio segregation we would like. Therefore, except
for simple cases where the evolution of pitch, loudness and some timbral
features can be relatively well extracted (such as simple monophonic
pieces), our point of departure is the musical score.
Two basic types of scores exist. In the first, the score represents the
evolution of perceptual components, such as pitch, loudness, timbre, etc.
In the second, traditionally called tablature notation, the score is a representation of the performance techniques required to obtain a particular
sound. In essence they are both control signals driving some salient feature of sound or some mechanism for its production. There are several
ways in which these high level control signals can be represented before
being modeled and transformed. Some representations will be more appropriate than others depending on their use and the type of music to
be represented.
Representation of Time
The simplest and most common way of representing a digitized scalar
signal is as a succession of values at equal time intervals:
s[n] =
so,
si,
... ,
(2.1)
sn
Since it is assumed that all samples share the same duration, this information need not be included in the series.
The same is true for a multidimensional signal where each dimension
describes the evolution of a musical component. For example, a series
describing the evolution of pitch, loudness and brightness might look like
this:
s[n]=
Po,
lo,
bo,
P1,
ii,
bl,
.- ,- Pn
... , in
...,I bn
(2.2)
For the specific case where there is more than one instrument or sound
source, as in a three voice fugue, a chorale or even possibly an entire
orchestra, the series can again be extended as:
Pa,0,
Pa,,
. -- ,
1a,,7
ia,1,
-.
- I
la,n
ba,o,
ba,1,
.. ,
ba,n
- - -,
Pb,n
Pb,0Pb,b,1,
s[n] =
lb,O,
1b,1,
. . . ,
lb,n
bb,o,
bb,1,
- -,
n
Pm,
_
Pa,n
Pm,1,
---
,
(2.3)
Pm,n
1m,0,
lm,1,
. -- ,
m,n
bm,o,
bm,1,
...
,
bm,n
While this is a simple representation scheme, it is not necessarily the best
in all cases. The problem with this representation has to do primarily
22
Musical Representation
with the temporal overlapping of events. Consider a polyphonic instrument such as the piano. With this instrument it is possible to articulate
multiple notes at the same time. How should we represent a sequence of
pitches that overlap and start and end at different times? An alternative
is to have a series that has as many dimensions as the number or keys in
the piano, where each dimension represents the evolution of the velocity
of the attack and release of each key:
s[n]
.
V20
_ Vm,0,
(2.4)
V2,1=
Vm,1,
...
,
Vm,n .
How could we reduce the number of dimensions and still have a meaningful representation? A tempting idea might be to separate a piano piece
into multiple monophonic voices and to assign each voice to a dimension
in a multidimensional polymelodic series as in 2.3. But in addition to
being an arbitrary decision in most cases (not all piano pieces are conceived as a counterpoint of melodies), the main problem is that even in
single melodic lines there may still be overlaping notes through legatissimo articulation.
An alternative representation is the Standard Midi File (SMF) approach,
where time is included explicitly in the representation by indicating the
absolute position of each event in time or the time difference: the Inter
Onset Interval (101). In addition to this time information there are the
duration of each event and the component values. The most economical
form of this representation, Format 0, combines all separate sources or
instruments into a few dimensions, one of which specifies the instrument
to which the event corresponds. The following is an example of this
format, where ioi stands for Inter Onset Interval, d for duration, p for
pitch, v for velocity (or loudness) and ch for channel (or instrument):
zozO,
s[n] =
L
z
... ,
zozn
n
pn
d0,
p0,
d1,
pi,
- -
Vo,
V1,
-... ,
cho,
chi,
... ,
...
,
(2.5)
Vn
chn .
With this format we can now represent overlapping events with only a
few dimensions. This is an adequate representation for the piano, due
to the discrete nature and limited control possibilities of the instrument.
Yet, it is far from being a good numeric replacement for a traditional
music score in general, the main problem being the loss of independence
between the multiple parameters. Because the explicit representation of
time intervals can be useful, we can still take advantage of this feature by
defining pairs of dimensions for each parameter: one for the parameter
values and another for their durations. Besides providing some insight
into the rhythmic structure of a sequence, it can also significantly reduce
the number of data samples:
Po,
Pi,
...
,
d0 ,
d1 ,
...
,
Pn
n
s [n] =
(2.6)
vo,
_ d0 ,
V1,
...
d1 ,
. ..
vn
,
,
dn _
While this representation lends itself best for discrete data, we can assume some kind of interpolation between key points and still make it
useful for continuous data.
Summary
There are multiple ways in which musical data can be represented. Different representations have different properties: some provide compact
representations, while others are more flexible. In addition, some may
be more informative about certain aspects of the music than others. The
choice of representation will also depend on the type of transformation
we are interested in applying to the data. In the present work we use
these three basic representations (constant sampling rate representation,
variable sampling rate representation, and SMF Format 0) in different
situations for the purpose of analysing and generating new musical sequences.
24
Musical Representation
CHAPTER THREE
State Space Reconstruction and
Modeling
3.1
State Spaces
One of the most important concepts in this work is that of state space
(or phase-space). A state space is the set of all possible states or configurations available to a system. For example, a six sided die can be in
one of six possible states, where each state corresponds to a face of the
die. Here the state space is the set of all six faces. As a musical example consider a piano keyboard with 88 keys. All possible combinations
of chords (composed from 1 to 88 keys) in this keyboard constitute the
state space, and each chord is a state. These two examples constitute
finite state spaces because they have a finite number of states. Some
systems may have an infinite number of states; for example a rotating
height adjustable chair. A chair that can rotate on its axis and that
can be raised or lowered continuously has an infinite number of possible
positions. Yet, in practical terms, many of these positions are so close to
each another that sometimes it makes sense to discretize the space and
make it finite (for example, by grouping all rotations within a range of
7r radians into one state). While this system has an infinite number of
possible states, it has only two degrees of freedom: up-down motion and
azimuth rotation.
State spaces can be represented geometrically as multidimensional spaces
where each point corresponds to one and only one state of the system. In
the chair example, since only two degrees of freedom exist, we can represent the whole state space in a two dimensional plane. Yet, we might
like to keep the relations of proximity between states in the representation as well. Therefore we embed the two dimensional state space of the
rotating chair in a three dimensional space in such a way that the states
that are physically close to each other are also close in state space. This
results in a cylinder, and is a "natural" way of configuring the points in
the state space of the chair.
An analogous musical example is that of pitch space. This one-dimensional
space can be represented with a line, ranging from the lowest perceivable
pitch to the highest. But this straight line is not a good representation
of human perception of pitch in terms of similarity. Certain frequency
ratios like the octave (2:1) are perceived to be equivalent or more closely
related to each other than, for example, minor seconds. In 1855 Drobisch proposed representing pitch space as a helix, placing octaves closer
to each other than perceptually more distant intervals [35]. Thus, we
have a similar case to that of the chair, where a low dimensional space is
embedded in a higher dimensional euclidean space to represent similar
states by proximity (Figure 3-1).
g#
g#
Figure 3-1: Helical representation of pitch space. A one-dimensional
line is embedded in a three-dimensional euclidean space for a better
representation of pitch perception.
3.2
Dynamical Systems
We have discussed the notion of state space and have given a few examples of their representation. But for the case of music, it is the temporal
relationships between states that we are most interested in (i.e. their dynamics), rather than the states themselves. Given this interest, it comes
26
State Space Reconstruction and Modeling
as no surprise that our first approach to studying the qualitative characteristics of motion in music comes from the long tradition of the study
of dynamical systems in physics. We will not discuss here the history
of this vast discipline, but only present some key ideas and important
results that will help us analyze, model and generalize musical structures
from which we will hopefully generate novel pieces of music.
3.2.1
Deterministic Dynamical Systems
We ought then to regard the present state of the universe as
the effect of its anterior state and as the cause of the one
which is to follow. Given for one instant an intelligence
which could comprehend all the forces by which nature is
animated and the respective situation of the beings who
compose it -an intelligence sufficiently vast to submit these
data to analysis- it would embrace in the same formula the
movements of the greatest bodies of the universe and those
of the lightest atom; for it, nothing would be uncertain and
the future, as the past, would be present to its eyes.
Pierre-Simon Laplace (Philosophical Essay on Probabilities,
II. Concerning Probability)
A deterministic dynamical system is one where all future states are
known with absolute certainty given a state at an instant in time. If
the future state of a system depends only on the present state, independently of the time at which the state is found, then the system is said
to be autonomous. Thus, the dynamics of an autonomous system are
defined only in terms of the states themselves and not in terms of time.
In the case of an m-dimensional state space, the dynamics of an autonomous deterministic system can be defined as a set of m first-order
ordinary differential equations for the continuous case (also called a flow)
[22]:
- x(t) = f(x(t)),
dt
t ER
(3.1)
or as an m-dimensional map for the discrete case:
xn+1
3.2 Dynamical Systems
= F(xn), n E Z
(3.2)
27
3.2.2
State Space Reconstruction
What if we know nothing about the nature of a system, and all we have
access to is one of its observables? Consider for example the flame of a
candle as our system of interest. Suppose that, similarly to Plato's cave
of shadows from his Republic, Book VII, we do not see the candle directly,
but only the motion of shadows of objects projected on the wall. Thus,
we have a limited amount of information about the candle's motion and
all its degrees of freedom. Can we infer the motion of the candle's flame
in its entirety from the motion of the shadows?
Suppose now that our observation is a piece of music; it is the shadow
moving on the wall. We now want to know what the nature of the "system" that generated the given piece of music is. Can we automatically
infer the hidden structure of the mind responsible for the generation of
the piece and, from this structure, generate other musical possibilities?
Evidently, in the context of this work, this example should not be taken
literally, but you get the idea.
We now state the question more formally. Given a series of observations
s(t) = h(w(t)) that are a function of the system W, is it possible to
reconstruct the dynamics of the state space of the system from these
observations? Can we find h 1 (s(t))? If s(t) is a scalar observation sequence, is it possible to recover the high dimensional state space that
generated the observation? Takens' theorem [40] states that it is possible to reconstruct a space X E Rm that is a smooth differomorfism
(topologically equivalent) of the true state space W E Rd by the method
of delays. This method consists of constructing a high dimensional embedding by taking multiple delays of s(t) and assigning each delay to a
dimension in the higher dimensional space X:
x (t)
=(s(t),
=
s(t
- -r),
. . . , s(t
(h(w(t)), h(w(t
-
-
(m
-
1)r))
T)),... , h(w(t
-
(m
-
1)T))
H(w(t))
The theorem states that this is true if the following conditions are met:
1. System W is continuous with a compact invariant smooth manifold
A of dimension d, such that A contains only a finite number of
equilibria, contains no periodic orbits of period T or 2 T and contains
only a finite number of periodic orbits of period pT, with 3 < p <
M.
28
State Space Reconstruction and Modeling
2. m>2d+1.
3. The measurement function h(w(t)) is C2.
4. The measurement function couples all degrees of freedom.
The condition that m > 2d + 1 guarantees that the embedded manifold
does not intersect itself. In many cases a value of m smaller than 2d +
1 will work because the number of effective degrees of freedom of the
system may be smaller due to dissipation or correlation between the
dimensions in state space [37].
3.2.3
Calculating the Appropriate Embedding Dimension
How do we decide on a good dimension m for the embedding? How
do we know if the dimensionality we've chosen for the embedding is
high enough to capture all the degrees of freedom of the system? For
deterministic time series we want each state to have a unique velocity
associated with it, i.e. if Xn = Xk then F(x,) = F(Xk). This implies
that for the case of continuous time series, trajectories should not cross
in state space. In other words, points that are arbitrarily close to each
other, both in space and time, will have very similar velocities.
Correlation Sum
A simple way of estimating a good embedding dimension for state space
reconstruction of deterministic systems is to find points that overlap or
that are closer than Eto each other. The correlation sum simply counts
the number of distances smaller than 6 between all pairs of points (xi, x 3 ).
2
N-1
N
(E- ||xi -xI|)
C(6) = N(N -)
(3.3)
i=1 j=i+1
Here E(x) is the Heaviside function, which returns 1 if x > 0 and 0
if x ; 0. Throughout the text we use the Euclidean distance metric:
2
2
.
+...+(m-m)
iy)
|x - y11 =
A good embedding dimension will be one where the correlation sum is
very small in proportion to the total number of points. Because it is
not clear what a good choice of Eis, the correlation sum is evaluated for
different values of E and different embedding dimensions m. Specifically
3.2 Dynamical Systems
29
for discrete series, with e smaller than the quantization resolution 6 we
can count the number of points that fall in the same place. Thus, when
we want a strictly bijective relation between the embedded trajectory and
the observation sequence, we choose the smallest embedding dimension
where the correlation sum equals zero, for e < J. i.e. C(e < 6) = 0.
False Nearest Neighbors
A more sophisticated method for estimating the embedding dimension
is the false nearest neighbors method [23]. This algorithm consists of
comparing the distance between each point and its nearest neighbor in
an m-dimensional lag space to the distance of the same pair of points in
an m + 1-dimensional space. If the distance between the pair of points
grows beyond a certain threshold after changing the dimensionality of
the space, then we have false nearest neighors. Kantz [22] gives the
following equation for their estimation:
Eanln a
Xfnn (r) =M
(m+1)
(m1 )
)
(
>1
- _ rr) ~) r -||
-x1)
Zain
)1
( )1(m )
m>
(3.4)
Where x(m) is the closest neighbor of xm in m dimensions and o is
the standard deviation of the data. The first Heaviside function in the
numerator counts the neighbors whose distance grows beyond the threshold r, while the second one discards those whose distance was already
greater than -/r. Cao [5] proposed a more elegant measure of false
nearest neighbors. He further eliminates threshold value r by considering only the average of all the changes in distance between all points as
the dimension increases, and then taking the ratio of these averages. He
defines the mean
E[m] =
all n
1) ||Xm +1
IIx~m)
Xn
(m+1)
(3.5)
- (1)n|
and E1(m) = E[m + 1]/E[m] as the ratio between the averages of consecutive dimensions. The ratio function E1(m) describes a curve that
grows slower as m increases, asymptotically reaching 1. The value of m
where the growth of the curve is very small is the embedding dimension
30
State Space Reconstruction and Modeling
we are looking for.' Typically the value selected is slightly above the
knee of the curve (Figure 3-2).
0.90.8-
0.7W
0.6-
0.50.4-
0.3
2
4
6
8
12
10
Dimension (m)
14
16
18
20
Figure 3-2: Example of a typical behavior of E1(m). The function
reaches 1 asymptotically, and a good choice for the embedding dimension is above the knee of the curve, where its velocity decreases
substantially.
3.2.4
Calculating the Time Lag T
While Taken's theorem tells us what the minimum embedding dimension
for state space reconstruction should be, it says nothing about how to
choose the delay T for the embedding. In theory, the choice of T is
irrelevant since the reconstructed state space is topologically equivalent
to the system's manifold. In practice, though, factors such as observation
noise and quantization make it necessary to find a good choice of T for
model estimation and prediction.
To better understand what a good choice of r would be, we take as
example one of the most popular systems in nonlinear dynamics: the
'This assumes that the time series comes from an attractor. If the series is noise
for example, El(m) will never converge.
3.2 Dynamical Systems
31
Lorenz attractor. The dynamics of the system are defined as
=
(y - x)
y
px - y-xz
z
-#3z+xy
Figure 3-3 shows 4000 points of the trajectory of this attractor, with
parameters o = -10,
p = 28,
#
= -2.666...,
and initial position x
=
0,
y = 0.01 and z = -0.01. Suppose our only observable is the x axis of
the system (Figure 3-4). We construct the lag space from this observable as x,,
(Xn, xn-r, Xn2T).
Figure 3-5 shows the reconstructed
50
40,
30
10,
0
-10
50
0
y
-50
-20
0
-10
10
20
x
Figure 3-3: Lorenz attractor with parameters o= -10,
p
28,
#3
-2.666....
state space from x for different values of T. For TF 1, all the points in
the reconstructed space fall along the diagonal. T = 1 is clearly small
because the coordinates that compose each point are highly correlated
and almost identical. As T is increased, the embedding unfolds to reveal
the structure of the attractor, which is clearest for values of -r between
8 and 14. As T continues to increase, the geometry gradually distorts
to an overcomplicated structure. While visual inspection is a good way
of estimating the delay parameter, in many cases the dimensionality of
the state space must be greater than three (apart from the fact that we
ignore the shape of the real state spaces that generated the observation).
Thus, a quantitative measure to estimate the appropriate value of T is required. The method we will use is based on the above observation about
the correlation that exists between coordinates in the reconstructed state
32
State Space Reconstruction and Modeling
Figure 3-4: Evolution of the x dimension (our observation) of the
Lorenz system. The abscissa shows the sample steps of the sequences
while the ordinate shows the actual value of the x axis of the Lorenz
attractor.
space. If coordinates are highly correlated, then the reconstructed trajectory will fall along the main diagonal of the space. This means that
we want to find a r large enough to make the correlations minimal yet
small enough to avoid distortion of the reconstructed trajectory. A measure of the mutual information between a series and itself delayed by T
provides us with a reasonable answer to this question [24]. Let p,, (i) be
the probability that the series s[n] takes the value i at any time n, and
ps(,sn_-(i, j) be the probability that the series s[n] takes the values i at
time n and j at time n - r for all n. The mutual information is then:
I(sn; sn--)
=
psn,sn_(i, j) log 2 P.,(W
(3.6)
Figure 3-6 is a plot of the mutual information of the observation x of
the Lorenz system as a function of T. The first minimum of the mutual
information function is our choice of r.
3.3
Modeling
3.3.1
Spatial Interpolation = Temporal Extrapolation
A trajectory in the reconstructed state space informs us about states
visited by the system, as well as their temporal relationships. How do
we generate new trajectories (new state sequences) that have similar
dynamics to the reconstructed state space trajectory? What other state
sequences are implied by this trajectory? Given a new state xn not found
3.3 Modeling
33
-10
x[n]
U
x[n-taul
-10
xtni
x[n-taul
15
15
10
10
5-
0,
0,
x -5,
x
-5,
-10
-10,
15
-15
-10
-10
0
0
10
10
-10
x[n]
-10
10
0
x[n-tauj
15
15,
10
10,
-
5,
-
n0
0
C
x
-5,
-5
-10,
-10
-15,
-15
-10
5
0
-10
010
010L
-10
10
0
x[n-tauj
-10
0
10
x[n-tau]
15
15
10
10
5
02
5
0
-5
-5
-10,-1
10
0
xln-tau]
-100
-15
-15,
-10L
-10
xn 10 -10
0
10
xn
-10
10
x[n-tau]
0
10
x[n-tau]
Figure 3-5: Three-dimensional embedding of the observable x[n] by
the method of delays with different values of T. From top to bottom,
left to right:
34
T
= 1,r = 2,F = 4,rT= 8,r = 1 4 ,r
=
2
0,T= 3 2 ,T= 64.
State Space Reconstruction and Modeling
6.5
6
5.5
-5
4.5
43.5
0
20
40
80
60
time laq (Tau)
100
120
Figure 3-6: Estimation of the value for r by mutual information. The
first local minimum is the best estimate of r.
in the original state space trajectory, what would be the "natural" or
implied sequence of states xn+1, Xn+2, Xn+3 ... ? Paraphrasing in more
musical terms, if our state space is a reconstruction of an actual piece of
music, how do we generate new pieces of music that have similar motion
characteristics to the original piece, yet different at the same time? What
other musical sequences are implied by the structure of the embedding
of the original piece?
These questions are intimately related to the problem of prediction, and
in a reconstructed state space prediction becomes a problem of interpolation [39]. There are multiple ways to interpolate the space and different
modeling techniques can be used. We can consider a single global predictor valid for all xn, or a series of local predictors which are valid only
locally in the vicinity of the point of interest.
Local Linear Models
Probably the simplest interpolation method is the method of analogues
proposed by Lorenz in 1969 [25]. Let x(i)n be the ith nearest neighbor of xn. The method of analogues consists of finding the nearest
neighbor x(1)n to xn, and equating xn,+1 to X(l)n+1.
In other words,
F(xa) = F(x(1)nl).An obvious improvement over this method is to calculate xn+1 from a number of nearest neighbors. The number of neighbors
can be defined by a radius e around the point xn (in which case this
number would vary depending on the density of the state space around
xn), or by choosing a fixed number on neighbors N. For both cases,
the combination of the neighbor points of xn can be weighted by their
distances to xn. An estimation of xn+1 from the weighted average of the
3.3 Modeling
35
N nearest neighbors of x, can be expressed as
N
F(xn)
F(xtiyn)#(||xn -- x(i)n 11)
=
(3.7)
i=1
where # is a weighting function [25]. In our present implementation
defined as
Ji
#i =
_
.N
6=1
with
Ji
=1||Xn - X(i)n1iP
- X()n P
1
||xn - x()n||P
H
=
where p is a parameter used to vary the weight ratios of the states
involved.
F(xn) =
#1F(x(1)n) + #2 F(x(2)n) +
...
# is
(3.8)
X(i)n
+ #NF(x(N)n)
(.) returns values between 0 and 1, and #(1 ) + 0(2) + .. . + #(N)
In other words, only the relative distances between point xn and its
N nearest neighbors are considered, not their absolute distance. As a
simple example of how this method performs, we calculate the flow of
the entire state space by applying this formula to equally spaced points
in the space. The two dimensional state space was constructed from
a sinusoid sin(27r40n)10 + 13 using the method of delays, with r = 6.
Figure 3-7 shows the original time series s[n] and Figure 3-8 the result of
the interpolation of the reconstructed state space. In this figure we see
20-
15-
10-
5
01'
0
10
20
30
40
50
Figure 3-7: Series s[n]= sin(27r40n)10
60
70
80
90
100
+ 13 with 25 samples per cycle.
that all the points in state space converge to the original trajectory. In
terms of the generation of new trajectories with similar characteristics to
the original, this result is not useful. We want states in our reconstructed
space to behave similarly to their nearest neighbors, not to be followed
by the same states as their neighbors. Therefore, we modify our function
36
State Space Reconstruction and Modeling
25
10 -
~-
0
Figure 3-8:
10
5
s[n]
15
20
25
Flow estimation of state space by averaging the predic-
tions of the nearest neighbors. Parameters: N = 3, p = 2,
T
= 6.
by replacing F(x(i)n) with Sci), = F(x(i)nl) - X(i)n. Thus we calculate
the velocity of point xn as the weighted average of the velocities of the
N closest points to it.
N
F(xn) =
Xn
+ Z x(i)n IlXn - X(i)n||)
(3.9)
i=1
Figure 3-9 shows the estimated flow from this method.
When the observable signal s[n] is by nature discontinuous (Figure 3-10),
so will the reconstructed state space trajectory. In this case, using a large
N will smooth out the interpolated space, distorting the characteristic
angularity of the original trajectory (Figure 3-11). The interpolation
function is an averaging filter smoothing the state space dynamics. It
is basically a Moving Average (MA) filter with coefficients changing at
each new estimation point. Obviously, using only one nearest neighbor
will preserve the angularity of the trajectory since no filtering takes place
(Figure 3-12). A nice middle point between a totally curved (smooth)
space and a linear one is the use of a large number of nearest neighbors
N together with an equally high p exponent. This results in a generally
straight flow but with rounded corners (Figure 3-13).
The signals used in these examples are distant from actual music, but
their simplicity makes the understanding and visualization of the concepts presented easier. In Chapter 5 we will use real music; for now just
3.3 Modeling
37
\%\
10
e(n)
is
25
20
Figure 3-9: Flow estimation for the state space by weighted average
of nearest neighbor velocities. Parameters: N = 3, p = 2, r = 6.
o*
Figure 3-10:
1
20
3
0
N8
3
40
48
50
Signal s[n] = 5,8,11,14,11,8,5,8,11,14, 11,8,5...
keep in mind that the observation signal s1n] can be any musical component such as pitch, loudness, 101, etc., or, as discussed in the previous
chapter, even multidimensional signals composed of all these parameters
simultaneously.
We have focused on a particular implementation of local linear models
as a way of generalizing the reconstructed state space dynamics. We
have also discussed the effect different parameter values have on the
state space interpolation. The implications these have on reconstructed
spaces from actual music will be discussed in Chapter 5.
38
State Space Reconstruction and Modeling
4
1
/
99
-
-
.
2-
0
Figure 3-11:
1
2
4
6
8
10
12
-
-
14
-
16
sini
Flow estimation of state space by nearest neighbor
velocities. Parameters : N = 8, p = 2, r = 1.
3.3 Modeling
39
4
*
4
'
'\'~
.
V\
IN\
[I+uls
k~
A'
k
Nll
N- Ni,
/7
/
/
P
/
4
4
*
/*
''N
X
/5
X
'Ne
N
7
/
[1+ulsr
XX
o
XX
X
/
XXX
/
o
/
be
a)
U)
a)
a)
-o
I..)
0
b0
0s
bOC.
co
0 q
C
-O
0
0
C
C
(/)
C
0
U
0
0.
(D(
Q.n
II
w.
N
C,,
(D
NN
II
04
0
1
4 /*
/
1
~
%
/y
k1
f
42
State Space Reconstruction and Modeling
CHAPTER FOUR
Hierarchical Signal Decomposition
4.1
Divide and Conquer
Signal analysis can be characterized as the science (and art) of decomposition: how to represent complex signals as a combination of multiple
simple signals. From a musicological perspective, it can be thought of
as reverse engineering composition. The power and usefulness of having a signal represented as a combination of simpler entities is easy to
see. Once a decomposition is achieved, we can modify each component
independently and recompose novel pieces that may share similar characteristics with the original. Just as with modeling, the best decomposition
depends on its purpose. For the purpose of musical analysis, we would
like the decomposition to be informative of the signal at hand and for
it to be perceptually meaningful. For the purpose of re-composition, we
would like the decomposition to allow for flexible and novel transformations.
How can we decompose our signal in a meaningful way? Different characteristics of a signal may suggest different decomposition procedures.
Frequently, a signal can be separated into a deterministic and a stochastic component. More specific components might be a trend, which is the
long term evolution of the signal, or a seasonal: the periodic long term
oscillation[6]. Thus, multiple hierarchical overlapping structures may occur simultaneously. Particularly within the seven orders of magnitude
spanned by music (see Chapter 2), these structures can vary tremendously from one time scale to the next.
Because of the relevance of hierarchic structure to music and music perception, this chapter will focus on decomposition methods that reveal
structure at different time scales. Before discussing these methods, we
give a brief summary of some preliminary developments.
4.1.1
Fourier Transform
By far the most popular representation of a signal as a combination of
simple components is the Fourier transform. This transform consists of
representing a signal as a combination of sinusoids of different frequency,
amplitude and phase. While it is a powerful linear transformation, it's
not without its limitations. Because the basis functions of this transformation are the complex exponentials (eiwt) defined for -oc < t < o, the
transform results in a representation of the relative correlation of each
of the sinusoids over the whole signal, with no information about how
they might be distributed over time. This is fine for stationary signals,
but music is almost never stationary. Thus, the Fourier transform is not
an optimal representation of music.
4.1.2
Short Time Fourier Transform (STFT)
In 1946 Gabor proposed an alternative representation by defining elementary time-frequency atoms. In essence, his idea was to localize the
sinusoidal functions in order to preserve temporal information while still
obtaining the frequency representation offered by the Fourier transform.
This localization in time is accomplished by applying a windowing function to the complex exponential
9u, (t)
= g(t - u)e.
(4.1)
Gabor's windowed Fouried transform then becomes
Sf(u,
)=
f(t)g(t - u)e~*dt.
(4.2)
The energy spread of the atom gug can be represented by the Heisenberg
rectangle in the time-frequency plane (Figure 4-1). gug has time width ut
and frequency witdh o-, which are the corresponding standard deviations
[27]. While one would like the area of the atoms to be arbitrarily small
in order to attain the highest possible time-frequency resolution, the
uncertainty principle puts a limit to the minimum area of the atoms at
-1
44
Hierarchical Signal Decomposition
gu,~t)
t
U
Figure 4-1: A Heisenberg rectangle representing the spread of a Gabor
atom in the time-frequency plane.
4.1.3
Wavelet Transform
While the STFT offers a solution to the problem of decomposing nonstationary signals, it still isn't the most optimal representation of our
data for the following reasons:
1. Many musical high level signals are discontinuous by nature. The
choice of a continuous basis is not necessarily the optimal choice.
2. At high levels of musical structure, the concept of frequency has
little or no meaning.
3. The STFT does not give us information about the structure of a
signal at different time scales.
The development of a method for multi-resolution analysis is necessary
when dealing with perceptually relevant data. Our hearing apparatus has
evolved to extract multi-scale time structures from sound. Inspired by
the function of the cochlea, Vercoe implemented multi-resolution sound
analysis in Csound [42] by exponentially spacing Fourier matchings. In
the early 1900s, Schenker proposed what is arguably his main contribution to music analysis: the abstraction of musical structures from
different Schichten, or layers at multiple scales [31]. Yet, Schenker's
analysis defines the multi-scale structures in terms of tonal harmonic
4.1 Divide and Conquer
45
functions and certain specific voice leadings. Here we would like to extract the multi-scale representation more generally in terms of motion
and, in addition, automatically. Thus, we import the general purpose
tools traditionally used in the micro-world of sound to the macro-world
of form.
As a natural extension of the STFT, the wavelet transform was developed
as a way to obtain a more useful multi-resolution representation. Like the
STFT, the wavelet transform correlates a time-localized wavelet function
0(t) with a signal f(t) at different points in time, but in addition the
correlation is measured for different scales of 0 to achieve a multi-scale
representation. Thus, the wavelet transform of f(t) at a scale s and
position u is given by
Wf(u, s)
j
f(t)4*
dt.
(4.3)
Wavelets are equivalent to hierarchical low-pass and high-pass filterbanks
called quadrature mirror filters [18]. A signal is passed through both
filters. Then the output of the low-pass filter is again passed though
another pair of low-pass and high-pass filters and so on recursively (Figure 4-2). The outputs of the high-pass filters are called the details of the
signal and the outputs of the low-pass filters are called the approximations.1 The number of filter pairs used in this recursive process defines
the number of scales in which the signal is represented.
While the "Gabor atoms" of a STFT are windowed complex exponentials, the development of the wavelet transform has introduced a wide
variety of alternative basis functions. The first mention of wavelets is
found in a thesis by Haar (1909) [20]. The Haar wavelet is a simple
piecewise function
1 if 0 < t <1/2
-1 if 1/2 < t <1
0 otherwise
(t)=
that when scaled and translated generates an orthonormal basis:
Oj,n M
-t)
(
2in)
V/2-i 2i
(j,n)EZ2
For a detailed discussion about the relationship between quadrature mirror filters
and wavelets, see [27].
46
Hierarchical Signal Decomposition
detail 1
detail 2
detail 3
low-pass
high-pass
detail 4
low-pass
approximation 4
Figure 4-2: Recursive filtering of signal s[n] through pairs of high-pass
and low-pass quadrature mirror filters.
Because of the discontinuous character of the Haar wavelet, smooth functions are not well approximated. Its limitation started the investigation
of alternative wavelets, and many varieties with different properties have
been invented since. Probably the most popular are the Daubechies family of wavelets. The Daubechies 1 wavelet basis is the same as Haar's,
and the family becomes progressively smoother as its number increases.
How then do we choose a wavelet type? What is the best set of wavelet
basis? Evidently, the definition of "best" depends on our goal. If our
goal were compact representation, then we would want a basis that used
the least number of wavelets without losing much information. The usefulness of this is obvious for compression and noise reduction, but the
choice of a wavelet basis that provides the most economical representation may also help us understand the nature and intrinsic properties of
a given signal. A common measure of how well a limited set of wavelets
describes a given signal is by a linear approximation. Given a wavelet
basis B = {#0}, a linear approximation sM of a signal s is given by the
M larger scale wavelets [27]:
M-1
(s, n
sM
(4.4)
n=o
4.1 Divide and Conquer
47
The accuracy of the approximation is typically measured as the squared
norm of the difference between the original signal and the approximation:
+00
E[M] =s
-
S
SM112 =
(8,On)12
(4.5)
n=M
Figures 4-3 and 4-4 show wavelet decompositions of Bach's Prelude from
Cello Suite no. 1. Both show the largest approximation and the details
at all four levels. For Figure 4-3 we used Daubechies 1 wavelet, and
Daubechies 8 for Figure 4-4. Measuring the approximation for sM = a4
in both cases with equation 4.5, we get e = 117.2 with Daubechies 8
Thus, Bach's prelude is better
and E = 122.69 with Daubechies 1.
Decomposition at level4
s
a4 + d4 + d3 + d2 + dl
50
a4
4
d3
2
-5
d 3
20
'R17TT'![
*
20
100
200
300
400
500
600
Figure 4-3: Pitch sequence of Bach's Prelude from Cello Suite no. 1
and its wavelet decomposition using Daubechies 1 wavelets. s is the
original pitch sequence, a4 the approximation at level 4 and dn are
the details at level n. The sequence is sampled every sixteenth note.
approximated with the Daubechies 8 wavelet. Notice though that the
decompositions of the signal using Daubechies8 are always smooth, while
those with Daubechies 1 are not. If the discontinuous character of the
prelude is important to preserve, then Daubechies 1 is a better choice of
wavelet. Because our goal is to generate new pieces and not to compress
them, this qualitative criterion is certainly more useful to us. In the next
chapter we will exploit the continuous quality of Daubechies 8 to obtain
musical variations with similarly smooth characteristics.
Hierarchical Signal Decomposition
DecompostionatIlevel4: a
n4+d4 +d3 +d2 +dl
60
S
50
40
so
-5
d
2
d
3
0
100
200
300
400
500
600
Figure 4-4: Pitch sequence of Bach's Prelude from Cello Suite no. 1
and its wavelet decomposition using Daubechies 8 wavelets. s is the
original pitch sequence, a4 the approximation at level 4 and da are
the details at level n. The sequence is sampled every sixteenth note.
4.1.4
Data Specific Basis Functions
As we saw in Chapter 3, state space reconstruction by the method of
delays allows us to uniquely represent each step in a given sequence as
a function of multiple variables: sa = h(xn) = h(x1, x2, . .. , x1n). h(xn)
must then be defined as a combination of the components of xa, for example sa = a1(xn) + a2(xnl) ± . . . + an(x-) or a1(xfl)a2(xnl) . . . an(xn) or
may have one of many other forms. Can the structure of the manifold
that results from the embedding tell us something about the structure of
h(xn) and its variables, and as a consequence on the observation signal
so? It turns out that similarly to the wavelet transform, a lag space representation automatically recovers structure at different time scales. This
is one of the most important observations regarding state space reconstruction by the method of delays. As an example, consider the following
signal: sin(2ir10t) + sin(27r47t)/4 (Figure 4-5). This signal can be seen
as a fast oscillation "riding" on a slow oscillation. In a three-dimensional
lag space (Figure 4-6), this simple linear combination of two sinusoids
becomes a torus, and the two levels of motion (a global cycle and a local
cycle) are clearly distinguishable. In Chapter 3 we assumed all along
that the systems we were dealing with were autonomous, yet a state
space trajectory such as this one can be interpreted as an autonomous
dynamical system being driven by another system. How do we separate
these two systems? Because the state space is reconstructed by assigning
4.1 Divide and Conquer
sin(27r10t) and sin(27r47t)/4.
Figure 4-5: Top:
sin(27r47t)/4.
Bottom:
sin(27rl0t)
+
0.5,
C\j
0
Figure 4-6: Three dimensional state space reconstruction of sin(27rl0t)+
sin(27r47t)/4, with r = 220.
Hierarchical Signal Decomposition
delayed versions of a single signal to each coordinate, they all have the
same information delayed by some r time. In other words, projecting the
embedding onto any of the three dimensions will yield the same signal.
A careful observation of the torus in three-dimensions reveals that by rotating the structure in such a way that the maximal variances are aligned
with the axes of the space, the previously redundant dimensions become
decorrelated, separating the two sinusoids almost completely (Figure 47). Figure 4-9 plots the two signals resulting from the projection on two
of the three orthogonal axes after rotating the torus. The method of
1,
s(t-tau)
11.5
0.5
-1-00.5
Figure 4-7: PCA on the torus resulting from the state space reconstruction of sin(27r10t) + sin(27r47t)/4, with r = 220.
Figure 4-8: Two component dimensions of the state space reconstruction of sin(27r10t) + sin(27r47t)/4 (with T = 220) after PCA transformation.
4.1 Divide and Conquer
51
delays therefore provides a means for decomposing our signal into a "natural" set of hierarchical basis functions. In other words, the functions
are derived from the data rather than selected a priori. The transformation that rotates the state space trajectory so that dimensions become
decorrelated is called Principal Component Analysis. In Chapter 3 we
saw that the interpolation of the state space manifold "sketched" by the
reconstructed trajectory is a way of generalizing the dynamics of a signal.
These recovered basis functions are the other half of the generalization
of the structure. These building blocks are independent entities that
can be modified and recombined to generate new state space geometries
while preserving its identity (its "torusness"). For example, multiplying the amplitude of the first sinusoid by 5 results in the state space
trajectory shown in Figure 4-9. This is a simple example by which we
-5
5
0
s(t-tau)
Figure
4-9:
sin(27rl0t)5
Three
5
-5
dimensional state
T = 220.
space
reconstruction
of
+ sin(27r47t)/4, with
illustrate concepts and tools for music structure generalization. In the
next chapter we will use them to generate new pieces of music.
For completeness, we compare PCA with the wavelet transform using
again Bach's Prelude. We arbitrarily choose eight dimensions for the
embedding of the Prelude, and r = 1. Figure 4-10 shows the bases resulting from applying PCA to the lag space and projecting the trajectory
on each of the dimensions.
52
Hierarchical Signal Decomposition
7011
500
600
700
400
500
600
700
300
400
500
600
700
200
300
400
500
600
700
-
100
200
300
400
500
600
700
0
100
200
300
400
500
600
700
0
100
200
300
400
500
600
700
100
200
300
400
100
200
300
0
100
200
020
100
00
0
0
0
50
I
0
0
-0
0
-2
501
-20
200
-20
-20
Figure 4-10: Pitch sequence of Bach's Prelude from Cello Suite no.1
and each of the eight principal component extracted from an eight
dimensional lag space. s is the original pitch sequence, P~n the nth
principal component. The sequence is sampled every sixteenth note.
4.1 Divide and Conquer
53
4.1.5
Principal Component Analysis (PCA)
The main purpose of Principal Component Analysis is that of decorrelating a set of correlated variables. As already suggested, the variables
we are interested in decorrelating are the dimensions of a lag space. We
want each of the dimensions to become as independent as possible so
that each provides unique information about the state space trajectory.
We have seen that a careful rotation of the lag space can separate the
different scale dynamics of a signal s[n]. But how do we do this mathematically? Essentially what we want is that the m dimensions that make
up the lag space trajectory be as independent and as informative as possible. PCA is the process of obtaining linear independence between the
m vectors zi,z 2 , ... , Zm, i.e. if a 1 zi + a 2z 2 + . . + amzm = 0 then ai = 0
for all i. Statistically, m variables are linearly independent if their covariances equal zero. Given that the m length observation vectors x are
vertical, their covariance matrix is defined as:
C, = E[(x - E[x])(x - E[x])T]
We want a transformation y = Mx such that Cy becomes the identity
matrix.
In relation to x, the covariance matrix of y is
Cy
=
E[(y - E[y])(y - E[y])T]
=
=
E[(M(x - E[x]))(M(x - E[x]))T]
E[(M(x - E[x]))((x - E[x]) T M T )
=
ME[(x - E[x])(x - E[x])T]MT
=
MCxMT
Choosing M to be the eigenvectors of Cx is the transformation that
makes Cy the identity matrix, rendering the dimensions of the lag space
linearly independent.
Yet, it is important to be aware of the assumptions and limitations of
this trasformation[36].
1. Linearity. The state space manifold resulting from a time lag embedding may (and many times is) curved and twisted. In order to
avoid redundancy, a nonlinear transformation would be required
prior to applying PCA . In the sum of sinusoids example, the comHierarchical Signal Decomposition
ponents were by definition linearly independent, which allowed us
to separate them well. Unfortunately, this is rarely the case.
2. The principal components are orthogonal. Related to the problem
of linearity, it is possible too that the points in lag space result
in distributions whose axis are not perpendicular to each other.
Thus, while PCA may decorrelate some axes, it may not decorrelate
others.
3. Large variances reflect important dynamics. For the purpose of dimensionality reduction, PCA assumes that the important information is in the parameters (or dimensions) with the greatest variance,
while those with the least variance are discarded as insignificant or
noise. In the Bach example, the large scale dynamics are the ones
with the greatest variance, but this is not necessarily always the
case (although it usually is).
While PCA has its own limitations, it is still a useful transformation that
can allow us to extract and modify dynamics at different time scales.
Other more robust transformations attempt to achieve statistical independence between all the variables (p(xi, xz) = p(xi)p(xj)), rather than
just linear independence. The approaches that fall into this category are
called ICA: Independent Component Analysis. We will not deal with
this family of transformations here.
4.2
Summary
The possibility of representing a signal as a combination of simpler basis
functions is a second level of generalization. This decomposition allows
us to transform the state space trajectory with much more flexibility
and, by implication, the original observable sequence as well.
Applying PCA on the lag space results in a set of useful basis functions
that reveal hierarchical structures in a similar way the wavelet transform
does. The simplicity of PCA over that of wavelets is another attractive
feature. On the other hand, having an analytical description of the
wavelet basis can give us more control over the type of decompositions
and transformations we may perform. In the following chapter we will
apply both transformations to concrete musical examples.
4.2 Summary
55
56
Hierarchical Signal Decomposition
CHAPTER FIVE
Musical Applications
We have presented some theoretical background, tools and methods useful for the analysis, modeling and resynthesis of music, and are ready to
give them some concrete applications. We have also mentioned how each
of the methods provides some aspect of the generalization necessary for
an inductive model. We now summarize the three types of generalization
discussed:
1. State space interpolation (Chapter 3): The recovered state
space trajectory by the method of delays serves as a kind of "wireframe" that suggests the existence a full hyper-surface or manifold
that can be estimated by interpolation. New trajectories that fall
on the manifold and follow its flow constitute the generalization.
By following the flow, new pieces with similar dynamics and states
can be generated.
2. Abstraction: states vs. dynamics (Chapter 3): The recovered state space trajectory provides information about the states
visited by the trajectory and the dynamic relationships between
these states. Each of these two pieces of information can be separated and varied independently for the generation of new music.
3. Decomposition (wavelets, PCA, ICA) (Chapter 4): Either
through wavelets or through PCA or ICA, it is possible to decompose a musical sequence into hierarchical, simpler, and meaningful
components. These decompositions reveal embedded dynamics and
provides an additional level of understanding of musical structure
and flexibility for subsequent transformations.
All three aspects can be used as tools in music analysis and synthesis, and
may be combined in many different ways. Here, we can only give a few ex-
amples of their use and suggest some other possibilities. Sound files from
the examples discussed in this chapter as well as additional relevant material can be found at http: / /www.media.mit.edu/~vadan/msthesis.
5.1
Music Structure Modeling and Resynthesis
Combined Components vs. Independent Components in Multidimensional
Signals
When considering the multiple components of a piece of music, we can
opt to model their dynamics independently or collectively. Keeping
things separate gives us more control over the variety of transformations we can apply to music, but modeling all components collectively
allows us to generate new trajectories that preserve their aggregate dynamics in the original training piece. The state space reconstruction of
these two approaches is depicted in Figure 5-1.
sn
1
AA
In]:~
/\SSR
~
~
i[n]:
5V
n
Aa zz
i:n]
sinn:
SSR
sin
SSR
n
Figure 5-1: Top: The aggregate dynamics of all the musical components such as pitch, loudness and 10I are modeled jointly by constructing a single state space. Bottom: Multiple state spaces are
reconstructed and modeled from each component independently.
Musical Applications
For the purpose of state space reconstruction, a multidimensional signal
[n] =
Si,0,
si,1,
... ,
s1,n
s2,0,
s2,1,
- - -,
s2,nl
s1,11
-
.
_ sl,0,
,
(5.1)
_
81
s,n
can be treated in essentially the same way as a scalar signal. Each
point in state space is defined as a set of delays of each of the signal's
components:
x[n]
=
(s1[n],s1[n-Tr],...,s1[n-(m-1)r],s2[n],s2[nr-r],...,
s2[n - (M - 1)r], ... , si[n], si[n - r), ... , sl[n - (m - 1)-..2)
The reconstructed state space has dimension 1m. As can be seen in
the previous equation, the resulting state space is an aggregate of the
individual state spaces of each component in the signal s[n]. Thus, a
different T can be chosen for each component si[n]. Since the value
ranges of the components si[n] will usually differ widely, it is important
to whiten the reconstructed state space to avoid having dimensions that
are too flat relative to others. Having different variances for the different
dimensions will distort the map or flow estimation since our weighting
and averaging functions rely on euclidian distances. After estimating a
new trajectory, we must de-whiten the data.
5.1.1
Interpolation
As we reviewed in Chapter 3, the type of interpolation we perform on the
reconstructed state space can greatly affect the qualitative characteristics
of the estimated flow. Thus, a careful choice of the state space model
parameters can be crucial to obtain good musical output. For the case of
the local linear models discussed in Chapter 3, there are two parameters
to consider: the number of nearest neighbors and the p exponent in
Equation 3.8 (i.e. the weight of each neighbor as a function of their
distance). In addition to whatever modeling technique we use, there
is the choice of embedding dimension and the initial state x 0 for the
estimation of a new trajectory.
As a concrete example of how the different parameters affect the interpolation and thus the outcome of new musical pieces we use Ligeti's
Piano Etude no.4 Book 1 (Figure B-1). We combine pitch, loudness,
I01 and duration in a single lag space and estimate a new trajectory
5.1 Music Structure Modeling and Resynthesis
59
for multiple combinations of the parameters already discussed. For each
new trajectory estimated we systematically vary one parameter value at
a time. The embedding dimensions explored are 2, 4, 6, 8 per parameter. So, given that we are embedding four parameters together, the lag
spaces actually have 8, 16, 24, and 32 dimensions. The number of nearest neighbors used are 2, 4, 6, and 8, and values of the exponent p are 2,
16, 64, and 128, As the initial condition we always use the mean of the
space. Figure B.3 in Appendix B shows the new generated sequences in
pianoroll representation. All sequences are 500 notes long.
Number of Neighbors and the p Exponent From the pianoroll
plots of the generated pieces we see that for very high values of p, particularly p = 128, the resulting sequences are almost identical to some
fragment of the original Etude. This is because for such high values
of p the nearest neighbor to the point being estimated will have much
greater weight in defining its behavior than the more distant points.
Thus, no matter how many neighbor points are considered in the estimation, having p = 128 is practically the same as considering only one
nearest neighbor. The resulting trajectories are then transposed copies
of the original trajectory. The other problem with using only one neighbor or too high a value for p is that, particularly for embeddings with
insufficient dimensions, it is very likely that the estimated trajectory will
fall in an infinite loop. In Figure B.3 we see several instances of this.
The second half of panels two and three (from left to right, top to bottom), as well as panel eight are examples of this. Ideally we want to
use several neighbors for the interpolation to be more accurate. But,
as discussed in Chapter 3, the use of more than one nearest neighbor
results in smoothed out maps, distorting the originally angular quality
of discontinuous dynamics. Thus, a careful balance between the number
of nearest neighbors and the values of the exponent p must be found in
order to produce novel yet sharp interpolations. Naturally, if the original
series where continuous, then the choice of number of neighbors would
not be a problem.
Dimensionality of the Embedding In Chapter 3, we discussed the
requirement of finding a high enough dimension for the embedding to be
able to capture all the degrees of freedom of a continuous autonomous
system, and for a bijective relation to exist between the observation sequence s and the state space trajectory x. What are the implications of
the embedding by the method of delays on discontinuous series? Mainly
Musical Applications
that trajectories may cross without violating the bijective relation between the state space trajectory and the observation sequence because
trajectory crossing of discontinuous series does not imply an overlap of
points. Thus, the embedding dimension for discontinuous sequences can
be smaller and still be good for the generation of new sequences. From
the plots in Figure B.3 we can see that, in general, large scale dynamics
from low dimensional embeddings (2 or 4) are rather chaotic compared
to those of high dimensional ones. 1 To understand why this is the case,
consider the nature of the reconstruction by the method of delays. Each
state in lag space is defined as a set of ordered values taken from the
observation sequence. If the embedding dimension m = 2, we define
each state as a set of two values in the observation. Setting T = 1, for
example, defines each state as two successive values of the observation
sequence. Choosing a four dimensional embedding would give us a map
of all the four value combinations found in the sequence. Evidently,
one state might be visited more than once if the dimensionality is not
high enough. It should be clear that the higher the dimensionality of the
space, the sparser it will be. If the dimension is too high, very few points
will occupy the space, and different trajectory segments will be at a big
distance from each other, making their interaction for the generation of
new trajectories more difficult. If the dimensionality is small, the space
will be very dense and points may fall on top of each other. Here, new
trajectories will be estimated from several trajectory segments that are
close to each other in the embedding, but not necessarily close temporally in the original sequence. Thus, these points may have very different
velocities.
Initial State This is the most difficult parameter to estimate in terms
of the predictability of the output. Evidently, choosing any of the points
that fall in the embedded trajectory as initial state will result in the
original music sequence from that point on. The actual estimation of
new states begins when the end of the trajectory is reached. In time
series prediction this is the natural point to begin the estimation of a
new trajectory. But for the purpose of generating novel trajectories
without forecasting motivations, any point in the state space that does
not fall on the original trajectory will do. A reasonable thing to do might
be to select the initial state at random, but for the purpose of evaluating
'This is similar to what happens with Markov models, where small order models
result in sequences that are locally similar to the original but globally unrelated, while
higher order models yield sequences that preserve the structures of the original piece
at higher scales.
5.1 Music Structure Modeling and Resynthesis
61
parameter values and dimension sizes, it is best to choose a constant
initial state.
There is only so much we can do by modeling all the components together. The lack of control on the independent parameters and the limited number of variables in the local linear models used make it difficult
to produce novel pieces that are clearly different from the training piece.
Because the training piece in its entirety is modified by only a few parameters, the new generated pieces are coarse variations on the original.
In order to produce more refined transformations, we must make use of
the decomposition methods discussed in the previous chapter.
5.1.2
Abstracting Dynamics from States
Rotations
A geometric interpretation of the reconstructed state space suggests a
variety of transformations of the musical data. Simple transformations
like rotations have been used as generators of musical variation [43, 16,
1]. In fact, the classical transformations of inversion, retrograde and
retrograde inversion found in western music since the 16th century can
be understood as four simple space-time transformations: 180' rotations
in two-dimensional lag space and temporal reversing.
In Chapter 4 we reviewed how PCA on a reconstructed state space is
nothing more than a special purpose rotation for decorrelating the component dimensions. But more generally, rotations can be interpreted as
a way of changing the states visited by a dynamical system while keeping
the dynamics unchanged. Thus, a state space trajectory defines a set of
possible state sequences that share the same dynamics.
If the purpose of the embedding is the application of geometric transformations exclusively, then the requirements for a proper embedding
for state space reconstruction (see Chapter 3) can be ignored, and all
dimensions m > 2 are useful. In this interpretation the dimension of the
embedding defines the possible degrees of transformation. The number
of degrees of freedom of a rotation is defined by the dimensionality of the
space. For an m-dimensional space, there are C(m) planes of rotation
[2]. Thus, higher dimensional spaces have the potential for a greater
variety of transformations.
62
Musical Applications
A rotation is a linear combination of the m elements defining each point
in lag space. For example, given a series s[n], we embed it by the method
of delays with a delay of T and dimension m = 3. We get the embedded
trajectory x[n], so that each point in the m-dimensional space is xn =
(sn, s-7-, sn-2). If we apply a rotation of angle 6 to the sn-2r axis of
the embedding space, we get:
xc
[ Sn
=
sn-,r sn-2r
cos 0 sinG 0
]
-sin 0 cos 0 0
0
(cos Osn - sin
=
0
]
(5.3)
1
s--r, sin 9sn + cos Osn-r, Sn-2r)
These rotations can be applied to the lag space of any musical component independently or to that of the combined components. In this
case the components will be combined, so that pitch, for example, will
be a combination of pitch, rhythm, etc. As an example of rotating the
Courante.
Id I
I
I
I
33
do..
aI..
I
-3
J
Figure 5-2: First six measures of Bach's Courante from Cello Suite
no.4.
lag space of a single component we take the first six measures of Bach's
Courante from Cello suite no.4. We consider only one component: the
Inter Onset Interval (which in this case is the same as the duration of
each note) .2 We embed the sequence in three-dimensional lag space
(Figure 5-3). We explore all 64 combinations of 90' rotations in this
three-dimensional space and plot the projections of each rotation. As
can be seen in Figure 5-4, many resulting patterns are identical. Only
six patterns are actually different (Figure 5-5). Thus, with 900 rotations we obtain only five new sequences out of a total of 64 rotations. We
are curious to know what other patterns might be obtained from other
rotations, so we compare the projections of all 512 450 rotations. Out
2
While the trill "embellishment" in measure 4 is not written out explicitly, it is an
important element of the rhythmic sequence. Thus, in the electronic score we write it
out as 32nd notes.
5.1 Music Structure Modeling and Resynthesis
63
1,
0.9
0.8
0.7
0.6
0.5-
.
0.4.
1
0.3n
0.2
3
0.8b
0.6
0.4
A
0.2
s[n+Taul
0.2
.. . . . 0.6
0.4
0.8
Figure 5-3: Three-dimensional embedding of 101 of the first six measures of Bach's Courante from Cello Suite no.4.
00
0
0
~
~
0 000
~
0:000
1718I I
00 00 0
0: 0
0 0:
0 :0
10
Time inquarternote units
Figure 5-4: Projections of all 64 900 rotations of the three-dimensional
embedding of the 101 from Bach's Courante.
64
Musical Applications
04
44
0
.
4-444444 44
444
.121:44
.260 .
.. .
16444
4
.
.
~44444.44.4...44444444444: 44444-4-444..........
1
E 00
of
800 0
4 ...
Patr
the4embedding.
10
4
number~
14
12
4 is
16
1
rhyhmi
orgina4
Bachs
.
...
.4.:.4.:.4.4.4.4-.-.4.4.4.4.4.
44404444:446444.
00 2 -
.
18
sequence.
. .. .
.~ .. .. .4~44.
4 4
4~~~~~~~~
.4444.
. .4-4-
444.
of 512, 26 (including the original) are distinct. Figure 5-6 is a plot of
all 26 non-repeating patterns. Figure 5-7 shows patterns no.1 and no.25
in musical notation. They are approximations quantized to the 64th
note, particularly in cases where no clear rational proportion was found.
pattern in Bach's Courante is
From our interpretation the rhythmic
Time in quarter note units
18
16
14
12
10
8
6
2
0
patterns
Nh i --4on-repeating
Figure
Resnthsiresulting from ll51'4 rota-n
0and--4
Modeling
Music5-6: Srcu
5.1
:4-4-0
4-4-444
0444444.44.4-0--0-**
444...
4444:4.010
40.-..*.
.4.4:.944
14 is Bc' rgnlryhi
tosof thethe-iesoaIO embedding. Pattern number
Bah'e oignaehyhmcceqene
000004.0 4. 0' 0 ~ 4
4 44...
444.440,0-0.44444 4444
65000......
44
4.4.
.4.4*.O
.4
1
oof
.0OOO
.4.
.4
.4
4
4. .4
-4
04
44 4.44.4.
44
4- Ow..44.
44444440.o 4:-0-4 .0 4 4 4 4
. 4. . . 4 4 4 4 J 4 4
opathersgthat) sare aiticmmo
a fam(ilydn
ow 40 04 40..:44444
4 440 4. 4.444:40. 444:44444
. 4 4 . 0~~~~4
.
stuct.Hre
we lto
new duatins not
have generaten.444444444..
rathe thannrpatiiscreterphb.
4. 44-4.44444
- 04.. Wue
.4.40
- :04- 4 0 :4:444
.4. 4.4 44.40 444 4.00* .044. 4-- - .4O '..4.0 . 44..
444444
0.4 -4
.0!0
4.
;... .4. 0 . 4.-0 .10 40 440*0
*.
.
0.4 4 4 4
..at ....
thathe
squnies
s.Tequaee aspwellmasionw
petin theurigal
simil arstrcurey to theee orial Beas rthesenew sequetince ware om-d
18
18
14
12
10
8
6
4
2
0
note units
Timein quarter
tis geative aprachi
sequenen
u trpeationso the vleofaorginalc
bin
4
-0.
...4
450 rota14 is
number
Pattern
embedding.
101
three-dimensional
tions of the
Bach's original rhythmic sequence.
Figure 5-6: Non-repeating patterns resulting from all 512
one of a family of patterns that share a common structure. Here we
see one of the advantages of considering musical space as a continuum
rather than a discrete alphabet. We have generated new durations not
present in the original sequence, as well as new sequences that have a
similar structure to the original. Because these new sequences are combinations of the values of an original sequence, this generative approach
5.1 Music Structure Modeling and Resynthesis
65
Pattern no.1
Pattern no.25
rnm
_Lj
___~2221__
____
Figure 5-7: Musical notation of patterns no.1 and no.25 from the 26
non-repeating patterns resulting from 45' rotations of the embedded
IOIs.
can be called combinatorial as opposed to systems based on discrete
Markov models such as the one we discussed in Chapter 1, which are
typically permutational. For an additional example in pitch space see
http: //www.media.mit.edu/~vadan/msthesis.
5.1.3
Decompositions
The top portion of Figure 5-8 shows a fragment of Ligeti's Etude no.4
book 1 in pianoroll representation. 3 The Etude is composed of two layers. One consists of an ostinato characterized by an ascending pattern
with a regular rhythm. The other is a homorhythmic polyphonic texture.
Throughout the piece, the ostinato is transposed up and down in jumps
of one or more octaves. We want to transform Ligeti's Etude so that its
global pitch dynamics become continuous. For this we wavelet transform
the pitch sequence of Ligeti's Etude using Daubechies 8 wavelet. The decomposition is done at 7 levels. After decomposing the signal we perform
a 4-dimensional state space reconstruction of the largest scale approximation and rotate it by 7r radians on each of the six rotation planes.
We then project the rotated trajectory back to a single dimension and
inverse wavelet transform it to recover the complete pitch sequence of
the Etude. Figure 5-9 depicts this process. Because the Daubechies 8
wavelet is smooth, the resulting transformation is smooth as well. The
bottom of Figure 5-8 depicts the result of the transformation.
3
66
This pianoroll plot was generated using the MATLAB MIDI toolbox.[13]
Musical Applications
CfC
G7# -/:
D7# --A6#
F6 6
I/.
"
T711
C8
D7 -.
E4 E6..
A1#A
"
*
-
G5 -
..e'.......I
-
..
7..
.
-
-6
-.
.........
- 6b"
......
Time i beat
A11
-E1
".5-
-
Time in beats
CD
07
A6
B0F--
50
100
150
200
250
300
350
400
450
500
550
Figure 5-8: Top: Fragment from Ligeti's Piano Etude no.4, Book 1. Bottom: Etude no.4 after a ir radians rotation
of the state space reconstruction of the 7th level approximation using Daubechies 8 wavelet.
U)
C
0
C5
CL
a.
i
T
a2
DW
a2
-------
--
------
2
.... 9
Figure 5-9: Diagram of the transformation process used to vary and render continuous the large scale pitch contour
of Ligeti's Etude. First, the pitch sequence is discrete wavelet transformed. Then, the largest scale approximation is
embedded in three-dimensional lag space and rotated to obtain a variation of the original trajectory. Finally, after
replacing the original approximation with the variation, the decomposition is inverse discrete wavelet transformed.
5.2
Outside-Time Structures
In this work we have focused our attention on the dynamic properties
of music, and have pointed out the advantage of considering the musical
space as a continuum rather than a discrete alphabet. But in addition
to the temporal aspects of music, a lot of the quality of a piece comes
from its "outside time" structures [43] as well. These consist of the sieves
through which the different musical components are discretized. For example, a piece sieved through a major scale feels very different from the
same piece sieved though a Phrygian scale. While the hard distinction
between in-time and outside-time may sometimes be inaccurate (modes
are a case where the sieve is dynamically defined), the distinction provides a simple way of modeling music structure.
5.2.1
Sieve Estimation and Fitting
Both for practical as well as perceptual reasons, it is many times useful to consider scales as cyclic structures. In Chapter 3 we mentioned
how the helical pitch space is a better representation of pitch perception
than a straight line. By collapsing the helix into a circle we can reduce
the infinite number of steps in the helical staircase to a compact set
of equivalent pitches classes. Indeed, in musical pitch theories such as
Forte's [17] or Estrada's [14], there is the notion of octave equivalence.
This notion allows the reduction of the pitch space into a small set of
pitch classes. This is simply a modulo operation on the pitch space, and
mod 12 is the modulo typically used for the twelve tone equal tempered
scale. Similarly, we may define a time sieve in the rhythmic dimension
(IOIs) via a modulo. If there exists a clear beat or higher rhythmic
structure like a meter, then we can reduce all the 101 to a time sieve
modulo the meter. Thus, the same helical structure used for pitch space
can be used to represent similarity (or equivalence) between events in
relation to their placement in time. Figure 5-10 depicts an example of
this representation for a triple-time meter.
Here we take a simple statistical approach to sieve modeling by comput4
ing a histogram of the values found in each of the musical components.
Essentially, the histogram of each component is the model of the sieve.
To sieve a newly generated trajectory we take two measurements into
consideration:
4
Contrary to traditional tonal music theory, where a scale in pitch space is defined
by the underlying functional harmony, all pitches found in a piece are considered part
of the scale.
5.2 Outside-Time Structures
69
I
2
3
4
Figure 5-10: A helical time line representing equivalent points in a
measure in triple-time.
1. The relative frequencies rf(Ek) and rf(Ek+1) of each of the two
s,
scale values Ek and Ek+1 surrounding point
in the new series.
i.e. Ek < sn < Ek+1-
2. The relative distance between the point to be fitted, s, and each
of its surrounding scale values. i.e. IIEk | and ||Ek+1 - Sn i-
The complete equation used to replace the estimated value by that of
the sieve is:
Ek
rf (Ek)P
||Ek -
rf (Ek+l)p
n||
||Ek+1 -
Qn1
Ek+1
We use the exponent p to adjust the weights of the relative frequencies
versus the distances. If the distance between the generated point sn and
its surrounding sieve points is considered to be more important than
their relative frequencies, then we use a low value for p and vise versa.
5.3
Musical Chimaeras: Combining Multiple Spaces
An interesting problem in automatic music generation is that of combining two or more pieces to generate a new one. This is not a trivial task,
and in the context of the present work the most natural thing to do is
to combine the reconstructed state spaces. There are three basic ways
of doing this:
70
Musical Applications
Method 1: Generate a new trajectory from the average of the flows
estimated for each of the embeddings:
zn+ = aF(xn) + bF(yn).
(5.4)
In this approach we estimate the behavior of a point in each of the separately reconstructed state spaces. The estimated velocities are then
averaged to obtain the behavior of the new point resulting from the
combination. The nature of this method suggests considering the combination of two pieces as a mixture where we can control the percentages
of each piece in the mix. For example, in a two piece mixture, we could
combine 90% of piece 1 and 10% of piece 2 simply by weight the estimated velocity vectors for each piece: zn+ 1 = 0.9F(xn)+0.1F(y,). As in
method 1, all the embeddings must have the same number of dimensions.
Method 2: Estimate a new trajectory from the superposition of the embedded trajectories:
(5.5)
zn+1 = F(xn, y).
All spaces must have the same dimensionality. The dimensionality necessary to capture the degrees of freedom of one piece will usually be
smaller than that necessary for all the pieces combined. Thus, we estimate the embedding dimension necessary for all the combined pieces
by taking them as a single sequence. Once the embedding is made, we
iteratively compute zn+1For methods 1 and 2 we can ask the following questions: how should the
spaces be overlapped? Should they be overlapped without alteration or
should they be transformed in some way; maybe translated so that they
share the same centroid or rotated using PCA so that their principal
components are aligned? Again, the goal of merging two or more pieces
is an important determinant of the approach to be taken. If our goal
is to keep each of the training pieces as clearly perceivable as possible,
then the geometric transformations just mentioned must be avoided or
kept to a minimum.
Method 3: Construct a state space from both pieces directly:
x
=
2
(Sln-(m-1)r, s n-(m-1)r, Sln-(m-2)r, s2n-(m-2)r, - -
sln-r, s2n-r, s1n, s2,)
(5.6)
In this approach instead of superposing two m-dimensional spaces, we
create a new l+n-dimensional space by combining an I and a n-dimensional
5.3 Musical Chimaeras: Combining Multiple Spaces
71
spaces. Thus, in this method the number of dimensions for the reconstructed state space of each piece may be different. While in methods
1 and 2 each lag space dimension is a combination of both training
pieces, in method 3 half of the dimensions are defined by one piece and
half by the other. Thus, each point in the lag space is defined by both
pieces. The difference is important because our new observation will be
a projection of the lag space onto one dimension per component. What
dimension should this be? Should it belong to training piece 1 or piece
2? The piece to which the projection dimension belongs to will dominate the new output. This can be seen as the carrier piece, while the
other is the modulator. As discussed in Chapter 4, this approach can be
interpreted as two systems driving each other.
To show how each of these methods performs, we combine two pieces
with each of the three methods. We are particularly interested now in
trying to preserve clearly perceivable characteristics from the two training pieces. Thus, we do not alter the embeddings of any of the pieces
in any way prior to their combination. The pieces we use are Ligeti's
Piano Etudes 4 and 6 from Book 1. Again, in order to see how these
three methods behave under different parameter configurations, we systematically explore several values for the three parameters: embedding
dimension, the number of nearest neighbors, and the the weight of the
neighbors relative to their distance (the p exponent). We generate 500
points (notes) for each test, again using the mean of the reconstructed
trajectory as the initial point.
Method 1 This method seems to be the one less likely to work. Looking at Figure B.4 and listening to the sequences generated with this
method we see that there is little resemblance between these and the
original Etudes. The pieces generated are noisy. In other words, there
is little structure and the music sounds quite random. This is especially
true for low dimensional embeddings. As the embedding dimension increases, though, the global dynamics become more varied and interesting.
Yet, the local dynamics are usually very irregular and the source pieces
imperceptible. Imagine embedding many pieces of music, each in their
own state space. What is the velocity of a single point in each of the
spaces? We will most likely discover these velocities to be very different.
Averaging these estimated velocities will then result in unpredictable
dynamics and possibly neutralization due to opposing velocities.
72
Musical Applications
Method 2 The choice of the number of nearest neighbors in the lag
space interpolation is important in defining the mixture of the pieces.
If only one neighbor is used, the dynamics of each point will be defined
by only one of the pieces being combined. The greater the number of
nearest neighbors, the greater the chances that the behavior of a point
will be defined by points from the two training pieces. The p exponent
equally affects the mixture. As we have seen, too high a value of p will
have practically the same result as using only one nearest neighbor since
the weights of distant points will be very small compared to those of
nearby points.
Figure 5-11 is a pianoroll plot of a 500 notes sequence generated from
the mixture. The parameters used to generate this result were p =
200, number of neighbors=11, and embedding dimension=3. What is
A7 E7 B6 -
*
"
B51C6#
D5#
-
E2 -
A# -
A0
50
10
lime in beats
15
20
25
Figure 5-11: Mixture of Ligeti's Etudes 4 and 6 Book 1 using method
2, with parameters p =200, number of neighbors=11, and embedding
dimension=3.
characteristic about this method is the fact that it tends to behave like a
collage, deserving the chimaera alias (this can clearly be seen in Figure 511). This is because in the superposition of the two state spaces, the
trajectories from both pieces will not always be close to each other.
At some point in the combined space, trajectories will diverge, while at
others they will cross or pass close to each other. Thus, at some moments
the new estimated trajectory will be dominated by one piece; at places
where the trajectories of the training pieces meet there will be a mixture
and/or a shift of dominance in the estimation from one piece to the other,
making the resulting sequence a kind of splice and mix, and reordering
5.3 Musical Chimaeras: Combining Multiple Spaces
73
of fragments of the training pieces. Naturally, the degree to which pieces
will be combined depends on their degree of similarity. Similar states
between pieces will be close to each other, while states that are not
similar will be apart. It is important, then, to find a representation that
will make the spaces overlap as much as possible, but, again, without
altering them since we want to be able to hear the original training pieces
in the new piece. Therefore, we modify the components of the sequence
(pitch, loudness, 101, duration) of the training pieces so that they match
as much as possible without altering perceptual features. For example, a
histogram of the 101 of Etude 4 shows that the most frequent 101 values
is 0.5 (a quarter note), while that of Etude 6 is 0.25 (an eight note)
(Figure 5-12). Because the difference can be interpreted simply as that
Etude 4
Etude6
2000
1600
1400
1500
1200
1000
1000
800
600
500
400
200
0
0.5
1
1.5
2
2.5
101in QuarterNotes
3
3.5
4
Figure 5-12: Histogram of 10I
0
0.5
1
1.5
101in QuarterNotes
2
2.5
of Ligeti's Etudes 4 and 6.
of tempo, (which generally is not a defining characteristic of the piece
unless the difference between tempi is too great) we scale the 101 of one
of the Etudes to equal that of the other. A similar transformation can
be applied to the pitch component. If the register is not something we
may consider very important, we can use the sequence of pitch intervals
(the Inter Pitch Interval) simply by taking the difference of the pitch
sequence. We don't do this in the present example though, keeping the
original pitch sequences unchanged.
The dimensionality of the superimposed spaces is also an important factor defining the degree to which the new estimated trajectory will be a
combination of the two training trajectories. Since each point is defined
by a sequence of notes in the training pieces, the higher the dimensionality of the state spaces, the less likely it will be that points from different
pieces will be close to each other. Thus, the best results are obtained
when the dimensionality of the embedding is low. On the other hand,
74
Musical Applications
as we saw before, choosing too low a dimensionality will make the high
scale dynamics of the resulting trajectory more chaotic.
Method 3 The third method is probably the most intriguing. As in
method 1, the generated sequences resulting from the combination of
both pieces into a single state space are generally noisy. A mind experiment might help visualize what's happening. Recall that the embedding
of a sum of sinusoids by the method of delays results in a toroidal structure (Section 4.1.4). In lag space these two sinusoids can be interpreted
as two signals driving each other, in a similar way as planets pull and
affect each other's motion. Including a third dimension with its own
dynamics will add an additional degree of freedom and complexity to
the system. By the Central Limit Theorem, the resulting dynamics will
tend to gaussian noise as the number of dimension with independent
dynamics tends to infinity.
Thus, if the dynamics of the training pieces are complex, the dynamics
resulting from their combination will be even more complex, and the
result will be noise. This method is effective only when the spaces to
be combined are relatively simple. Figure B.7 in Appendix B shows
the result of combining Ligeti's Etudes with this method for multiple
parameter values. Clelarly, the resulting trajectories from this mixture
are overly noisy. But rather than embedding all the parameters of both
pieces in a single space, we can embed each of the four components (pitch,
loudness, 101 and duration) independently (see bottom of Figure 5-1)
and combine the spaces corresponding to the same component (figure 513). Figure B.7 shows several 500 note sequences generated with this
configuration for the same parameter combination as with methods 1
and 2.
The choice on the exponent p greatly affects the output of the interpolation using method 3. From the multiple panels in the figures it can
be seen that no matter what the choice of dimension or number of nearest neighbors is, when p = 2 the resulting sequence is noisier than with
higher exponents. In method 3, p is clearly the most important factor
determining the degree to which the points interact, and thus, the degree
to which the pieces mix. It could be labeled the "mixing level knob" of
the method.
5.3 Musical Chimaeras: Combining Multiple Spaces
75
s[n:
Fu
s[n]:
n-3 E
s [n]:
SS~:
~R
e
SSR
[n]: sI[n]:
What state spaces Are Made Of
In addition to the alternative ways of combining state spaces, there are
also multiple ways of defining the spaces to be combined. We have jointly
modeled the dynamics of all the components of a piece (e.g. pitch, loudness, brightness, IOI), and each component independently. We've also
seen that we can decompose each component and model each of its subcomponents independently. In addition, we can reconstruct a state space
from combinations of subcomponents derived from corresponding components of different pieces. For example, the large scale approximations
of pitch from various pieces on one side, and their corresponding details
on another (Figure 5-14). We could also take the high scale dynamics
from a component in one piece and the low scale dynamics from another,
or even combine different components from different pieces. These more
intricate and experimental approaches may result in unique variations,
but the original training pieces will most certainly be lost perceptually.
76
Musical Applications
PCA
i,[
n
JV
Figure 5-14: Corresponding components from different pieces are combined at different scales, but each scale is
independent from the others.
78
Musical Applications
CHAPTER SIX
Experiments and Conclusions
6.1
Experiments
In this work we have presented methods to generalize and extrapolate
musical structures from existing pieces for the purpose of generating
novel music. Thus, what we might evaluate is the novelty of the pieces
generated with these methods in relation to the source or training pieces.
But testing whether a piece generated by the system is different from
the training piece is trivial. As we showed in the previous chapter we
can generate completely new sequences with enough transformations,
however arbitrary these may be. What we are actually looking for is that
ambiguous middle ground where the generated pieces share structural
characteristics with the training piece(s) while at the same time are as
novel as possible. Clearly, this is as much a test on the system as it is
on the system's user, since the art still is, after all, in the creative use of
the tools.
6.1.1
Merging Two Pieces
In the previous chapter we discussed three ways of combining two pieces
of music to generate new ones. In this chapter we present the results of
an experiment performed to evaluate how well each of the three musicmerging methods is able to generate novel musical pieces while still preserving structural characteristics of the original training pieces.
Experiment Setup
In this experiment subjects were presented with 10 excerpts: 5 pairs,
each generated with a different method:
Pair 1 Using method 1 with a single state space reconstruction.
Pair 2 Using method 2 with a single state space reconstruction.
Pair 3 Using method 3 with a single state space reconstruction.
Pair 4 Using method 3 with components modeled independently.
Pair 5 Randomly, using a gaussian distribution for each musical component. The parameters of the gaussian distributions used are shown
in Table 6.1.
Table 6.1: Parameters of the Gaussian distribution used for the randomly
generated musical sequences.
101
duration
pitch
loudness
p
a-
0
0.25
60
80
0.25
0.5
10
10
We used two excerpts per generation method because each method can
generate a wide variety of musical sequences. Using only one excerpt may
give misleading results that depend more on the particular example than
on the generation method. On the other hand, using more excerpts might
have made the experiment too long for subjects to remain interested and
attentive during the experiment.
The two pieces used as training data were Ligeti's Etudes 4 and 6, Book
I. In all cases the generated sequences (including the random ones) were
quantized to fit the union of the sieves of the two training pieces (see
Section 5.2). The reason for including randomly generated excerpts was
to test whether listeners confused some of the generated pieces with
noise. As we pointed out in the previous chapter, method 3 and to a lesser
degree method 1 (both with a single state space reconstruction combining
all musical components) generate noisy sequences. All excerpts were
between 30 and 40 seconds in duration except the training pieces, which
were presented complete.
The experiment was divided in two parts: in the first part, subjects
were asked to rate the complexity and their preference for each excerpt.
They were also asked to cluster the excerpts into groups on the basis of
80
Experiments and Conclusions
similarity. Subjects were free to create any number of clusters between
1 and 5, and their choice depended on their ability to perceived distinct
groups. Evidently, the perfect ear would have perceived the five groups
corresponding to the five generation methods used. In the second part
they were asked to evaluate the similarity between each of the excerpts
and each of the training pieces. The excerpts were labeled [Fl], [F2],
[F3], etc. in the forms used and were organized as shown in Table 6.2.
The forms used in the experiment can be found in Appendix C.
Table 6.2: Labels used for the excerpts in the experiment form and their
corresponding Pair type (i.e. production method).
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
excerpt
excerpt
excerpt
excerpt
excerpt
excerpt
excerpt
excerpt
excerpt
excerpt
from
from
from
from
from
from
from
from
from
from
Pair
Pair
Pair
Pair
Pair
Pair
Pair
Pair
Pair
Pair
1
2
3
5
4
5
1
2
4
3
Experiment Part 1 Results
None of the subjects perceived five distinct groups: 30.77% of the subjects distinguished four groups, 46.15% distinguished three, and 23.077%
distinguished two. How were the excerpts grouped? Were the two excerpts generated with the same method (and thus belonging to the same
Pair) actually clustered together? Table 6.3 shows the answer for each
subject to this question. For convenience, the bottom line in the table
shows the number of clusters found by each subject. Because all of the
subjects perceived four clusters or less, there is necessarily some overlap
of Pair types in every subject. Clearly, the fewer the number of clusters
found, the greater the chances that two Pairs will be clustered together.
Subject no.2, for example, correctly clustered togethered the excerpts
belonging to Pairs 3, 4 and 5, but only defined two clusters total. This
means that at least two of these Pairs were clustered together. Which
were clustered together? More formally, if Pair P = {Pi, P2} and Pair
6.1 Experiments
81
Table 6.3: Were the two excerpts generated with the same method clustered
together (1=yes, O=no)?
Subject no.:
Pair 1:
Pair 2:
Pair 3:
Pair 4:
Pair 5:
1
1
1
0
1
1
2
0
0
1
1
1
3
1
1
0
1
0
4
1
0
0
0
1
5 6 7
0 1 0
0 1 0
0 1 0
0 0 1
0 1 0
8
0
0
1
1
0
9
0
0
0
0
0
10
0
1
0
1
1
11
0
1
0
1
0
12
1
0
0
0
1
13
0
0
1
0
0
No. of Clusters:
4
2
3
4
4
2
3
4
3
3
2
3
3
Q
= {qi, q2}, the question then is (p1 = qi) A (pi = q2) A (P2 = q2)?.
Table 6.4 shows the results of this operation applied to every pair of
Pairs for each subject. In this Table we see that subject no.2 clustered
Table 6.4: Which of the Pairs were consistently clustered together (1=yes,
Subject no.:
Pairs 1 and 2:
Pairs 1 and 3:
Pairs 1 and 4:
Pairs 1 and 5:
Pairs 2 and 3:
Pairs 2 and 4:
Pairs 2 and 5:
Pairs 3 and 4:
Pairs 3 and 5:
Pairs 4 and 5:
1
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
1
0
3
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
6
1
1
0
0
1
0
0
0
0
0
7
0
0
0
0
0
0
0
0
0
0
No. of Clusters:
4
2
3
4
4
3
3
2
3
4
3
3
together Pairs 3 and 5. We can see that very few Pairs were consistently
clustered together: Pairs 3 and 4 for subject no.2 as we have just seen,
and Pairs 1, 2 and 3 for subject no.6.
To see to what degree the clusters defined by the subjects were a combination of excerpts belonging to different Pairs, we asked the question:
82
Experiments and Conclusions
Is any of the two excerpts belonging to Pair P = {p1,P2} clustered with
any of the two excerpts belonging to Pair Q = {qi, q2}? More formally,
(Pi = q1) V (Pi = q2) V (P2 = q1) V (P2 = q2)?. Table 6.5 shows the
results. Looking at the Tables we can see the variety of the clustering
Table 6.5: Is any of the two excerpts belonging to Pair P = {p1,P2} clustered with any of the two excerpts belonging to Pair Q = {qi, q2}?
Subject no.:
Pairs 1 and 2:
Pairs 1 and 3:
Pairs 1 and 4:
Pairs I and 5:
Pairs 2 and 3:
Pairs 2 and 4:
Pairs 2 and 5:
Pairs 3 and 4:
Pairs 3 and 5:
Pairs 4 and 5:
1 2 3
0 1 0
1 1 0
0 1 0
0 1 1
0 1 1
0 1 0
0 1 1
0 0 1
1 1 1
0 0 0
4
1
1
1
0
1
1
0
1
0
0
5
1
1
0
1
1
1
1
0
1
1
6 7
1 1
1 1
1 1
0 1
1 1
1 1
0 1
1 0
0 1
0 1
8 9 10
1 1 1
1 1 1
1 1 0
1 1 0
1 1 0
1 1 0
1 1 0
0 1 1
1 1 0
1 1 0
11
1
1
0
1
1
0
0
1
1
1
12
1
1
1
0
1
1
0
1
0
0
13
1
1
1
1
1
1
1
1
1
1
No. of Clusters:
4
4
4
3
2
3
3
2
2
3
3
3
4
results across subjects. Figure 6-1 shows graphically the clusters made
by a few subjects. On one extreme we have subject no.1 who found
four clusters and correctly grouped the excerpts of Pairs 1, 2, 4 and 5.
He/She correctly isolated Pairs 4 and 2, but clustered one of the excerpts
belonging to Pair 3 with Pair 1, and the other one with Pair 5. On the
other extreme we see Subject no.13. He/She found only two clusters,
and only correctly clustered together the excerpts belonging to Pair 3.
The excerpts belonging to the rest of the Pairs were divided in separate
clusters. Subject no.10 also did a good job in clustering the excerpts.
He/She was able to discriminate the noise excerpts (Pair 5) from the
rest, but confused excerpts from Pair 1 with those of Pair 2, and those
from Pair 3 with Pair 4.
From our own observations of each of the three music-merging methods presented in the previous chapter, we were expecting subjects to
frequently confuse the randomly generated excerpts (Pair 5) and those
generated with method 3 on a single state space reconstruction (Pair 3).
Adding across subjects in Table 6.5 we see that Pair 5 was clustered with
Pair 1 by 8 subjects, with Pair 2 by 7 subjects, with Pair 3 by 9 subjects
6.1 Experiments
83
Cluster 2
Cluster 1
(1,1
02
3
Cluster 4
Cluster 3
0
5,
3
Subject no.1
Cluster 1
3,3
5,)5
Cluster 2
4),4
1,2
1,2)
Subject no.2
Cluster 2
Cluster 1
~
1 51 '
2,2
Cluster 3
3,5
,
Subject no.3
005,
Cluster 1
Cluster 2
(21
Cluster 3
Cluster 4
((4,43
Subject no.10
Cluster 1
L3,3
Cluster 2
1,2,4,5
Subject no.13
Figure 6-1: Clusters found by several Subjects.
84
Experiments and Conclusions
and with Pair 4 by 6 subjects. These results match our hypothesis: Pair
5 was most confused with Pair 3. However, the confusion between the
gaussian noise excerpts (Pair 5) and those belonging to the rest of the
Pairs was more frequent and homogeneous than what we expected.
We also expected subjects to clearly distinguish the excerpts generated
with method 3 with components modeled independently (Pair 4) from
the rest of the groups. Again, adding across subjects in Table 6.5 we see
that Pair 4 was clustered together with Pair 1 by 8 subjects, with Pair
2 by 9 subjects, with Pair 3 by 8 subjects and with Pair 5 by 6 subjects.
Thus, the confusion of Pair 4 with the rest of the Pairs was also very
homogeneous.
Why did we obtain such fuzzy results? Of course, since we discretized
all excerpts with the same sieves, they all share the same "outside time"
structure. This common structure no doubt plays a big role in the
subjects perception of similarities between the noise excerpts and the
training based ones. It is likely that the longer the musical excerpts, the
more subjects rely on these "outside time" structures (which are actually
statistics abstracted from temporal placement) for evaluating structural
similarities. Thus, it is likely that the sieves have a greater weight on
the perception of similarity between the excerpts than what we thought.
How would the results change if we used shorter excerpts? What if we
had not sieved the gaussian excerpts? How distinguishable would they
be in this case? These questions will have to be left open for future
experiments.
The less relevant but nonetheless interesting results were subject's perception of complexity and preference of each of the generated excerpts.
Figure 6-2 shows the resulting means (bars) and standard deviations
(lines) of complexity and preference across subjects. The most preferred
excerpts were 5, 8 and 9, which belong to Pairs 4, 2 and 4 respectively.
Even so, there is a wide variance among subjects in their perception of
both complexity and preference. There isn't a clear relationship between
preference and complexity, but its curious that the most preferred is also
the most complex.
Experiment Part 2 Results
To evaluate how well each of the generated excerpts preserved structural
characteristics of the original training pieces, subjects were asked to rate
the similarity between each of the generated excerpts and each of the
6.1 Experiments
5453C2-
0i
1
23
4
5
6
7
ExcerptNumber
8
g
10
3
4
5
6
7
ExcerptNumber
8
9
10
5-
93-
~20
1
2
Figure 6-2: Means and standard deviations across subjects of their
evaluation of Complexity and Preference for each of the excerpts.
two training pieces. Figure 6-3 shows the means (bars) and standard
deviations (lines) of the estimates across subjects.
6
S54-
39 2-
0
1
2
3
4
5
6
7
ExcerptNumber
8
9
10
1
2
3
4
5
6
7
ExcerptNumber
8
6
10
6
w5-
13-
0
Figure 6-3: Means and standard deviations across subjects of the similarity estimates between each of the excerpts and each of the training
pieces: Ligeti's Etude 4 and Etude 6.
86
Experiments and Conclusions
The similarity estimates between the generated excerpts and Ligeti's
Etude 6 are fairly clear. With a relatively small variance, excerpt number
5 comes first with the highest similarity estimate followed closely by
excerpt 8. As we see in Table 6.2, excerpt 5 belongs to Pair 4 which was
generated with method 3, parameters modeled independently. Excerpt
8 belongs to Pair 2 which was generated with method 2, components
modeled jointly. It is interesting to note that these two excerpts coincide
with the two most preferred. How were the other excerpts belonging to
these pairs rated? The second excerpt belonging to Pair 4 is excerpt
number 9, which is the third most similar to Etude 6. Strangely, the
second excerpt belonging to Pair 2 (excerpt number 2) has the second
lowest similarity to Etude 6.
How can we explain these results? As discussed in Chapter 5, in contrast
to methods 2 and 1, method 3 is characterized by a dominance of one
piece. How one piece dominates over another depends on the state space
dimensions chosen for the projection of the novel trajectory. For the
generation of the excerpts using this method we projected the newly
generated trajectory onto the dimensions defined by the components
of Etude 6. It is not strange then that Pair 4 resulted in the highest
similarity estimate to Etude 6. Method 2 can best be described as a
"collage" method because many times a new state space trajectory will
be defined by one of the pieces for a period of time until an intersection of
trajectories belonging to different pieces is reached, in which case there
may be a shift from one piece's dominance to the other. The reason
why excerpt 2 may have done so poorly is that the initial condition in
the generation of this excerpt may have fallen close to the state space
trajectory of Etude 4 and never shifted to the trajectory of Etude 6. In
other words, this excerpt fell in a basin of attraction of Etude 4 and
never left it.
The similarity estimates between the generated excerpts and Ligeti's
Etude 4 are not so clearly different. The highest similarity scores were
given to excerpts 7 and 8, which belong to Pairs 1 and 2, followed by
excerpt 2 belonging also to Pair 2. The lowest similarity estimates were
given for excerpts 3 and 5, which came high in similarity with Etude
6. Excerpt 3 was generated with method 3 components modeled jointly,
and excerpt 5 with method 3 components modeled independently.
Which excerpt is the most similar to both training pieces? To answer
this question we calculated the mean across subjects of the product of
the similarity estimates between a given excerpt and Etude4, and the
6.1 Experiments
87
same excerpt and Etude6. More formally, let sim(a, b) be the similarity
estimate between a and b. Then, the similarity between excerpt x and
both training pieces is sim(x, Etude4) - sim(x, Etude6). Figure 64 shows these results. Excerpt 8 is clearly the excerpt with the most
similarity to both Etude 4 and Etude 6. It belongs to Pair 2, and was
thus generated with method 2.
25
20-
S10
-5
I
1
2
3
4
5
6
7
ExcerptNumber
8
9
10
Figure 6-4: Means and standard deviations across subjects of the
similarity estimates between each of the excerpts and both training
pieces.
It is not easy to state definite unambiguous conclusions given the results
obtained. From the results presented we could conclude that method 2
is the most successful in terms of preserving structural characteristics of
both training pieces because excerpt 8 scored the highest in the similarity
estimates of both pieces combined. Yet, excerpt 2 (also generated with
method 2) obtained a very low combined similarity to both training
pieces. As we mentioned in the previous chapter, a small change in
the initial condition of a state space interpolation can greatly alter the
evolution of the resulting observation sequence. Thus, while method
2 may be an excellent method for combining two pieces, an optimal
result may depend on the careful choice of model parameters and initial
conditions.
Method 3 with components modeled independently strongly preserves
structure from one of the training pieces. Thus, rather than as a method
of combining two pieces, it might be better understood as a method of
modulating one piece with another.
88
Experiments and Conclusions
The similarity ratings between the gaussian noise excerpts (Pair 5) and
the two training pieces were surprisingly high. How is it possible that
these excerpts were rated with comparable similarity values to those
generated with some of our methods, particularly methods 1 and 3 with
components modeled jointly? In our own perception, any of the excerpts
generated with either method 1, 2 or 3 are clearly better at preserving
structure from the training pieces than the gaussian noise examples.
Even the excerpts generated with method 3 with components modeled
jointly (which we saw in Chapter 5 generates noisy sequences) have large
scale evolutions that the gaussian noise excerpts don't, thus potentially
making them distinguishable. Could it be that we have overestimated
our methods, or could it be that we have overestimated the subjects'
ability to perceive music structure, or could it be both?
Most subjects who commented on the experiment expressed their difficulty in distinguishing clear differences between the excerpts. Indeed, in
general the results from both the clustering and the similarity experiments reveal poorer aural discriminability than what we expected. But
the large variances in all the results also reveal a wide range of ears.
Some subjects also talked about their experience with judging similarity.
A non-musician said she noticed mood changes from excerpt to excerpt,
so this was her main criterion for judging similarity. A professional
musician said that he noticed his criteria for judging similarity changed
from pair to pair. First he started rating the similarity between excerpts
without thinking much about it. Later, as he rationalized the task, he
realized that his criteria for similarity were many, and that the weight of
each criterion changed from excerpt to excerpt. Sometimes some features
were more salient than others, thus influencing more his decision about
what was similar.
There are clearly a large variety of perceptual approaches between different listeners. Different people focus on different aspects of a piece, and
therefore judge similarity in different terms. Each individual's perception of the musical examples given is also influenced by their experience
with music similar to the pieces used, and with music in general. It is
noteworthy that subjects no.1 and no.10 (who had the best discrimination results) are professional musicians. In addition, subject no.1 was
acquainted with both training pieces. Indeed, two Mozart sonatas may
sound very different to a listener acquainted with his music, while they
may sound as the same piece to an "untrained" person. Humans, just
like machines, are trained to perceive the world in different ways, and
6.1 Experiments
89
the training data individuals feed on can be quite different. One man's
music is another man's noise, and we see this in the arts continuously.
For humans, too, perception requires learning.
Experiments and Conclusions
6.2
Conclusions and Future Work
In this work we have presented an approach to the problem of inductive
music structure modeling from a dynamical systems and signal processing perspective which focuses on the dynamic properties of music. The
point of departure of the approach was the reconstruction of a state
space from which to obtain essential characteristics of music's dynamics,
such as its number of degrees of freedom. This multidimensional representation allowed us to obtain generalizations about the structure of a
give piece of music, from which we generated novel musical sequences.
We presented three main types of generalization strategies (Chapter 5):
State space Interpolation, Abstraction of the dynamics from the
states and Decomposition of the musical signals using wavelets and
PCA.
We used local linear models to model the reconstructed state space structure deterministically because of the method's ability to generate a variety of transformations, its speed and simplicity. There are, of course,
other ways of modeling the state space's reconstructed structure with
global methods, such as using polynomials, neural networks or radial
basis functions. Our explorations of radial basis functions were brief, so
we did not document them here. Nonetheless, it is worth mentioning
that we abandoned this method because, in contrast to the local linear
models used, the newly generated sequences using this method tended to
fall in attractors and thus generated infinite cycles. In addition, unless
one uses almost as many radial basis functions as points in state space,
the resulting interpolations are smooth. Therefore, this method is not
very useful unless one is dealing with continuous sequences only. Other
modeling strategies remain to be explored. Of particular importance is
the incorporation of stochastic methods into the overall system.
We discussed the tight relationship between states and dynamics, and
how a collection of states only allows a variety of state space paths (dynamics) and vice versa. We demonstrated the use of rotations of the
state space to generate novel state sequences that preserve the exact
same dynamics. However, we did not explore the reverse: keeping the
same states while changing their temporal relationships. The classical
(and probably only frequently used) transformation in this respect is reversing the sequence, but evidently other reorderings are possible. These
remain to be explored.
6.2 Conclusions and Future Work
91
By decomposing musical sequences hierarchically we were able to achieve
an additional level of flexibility that allowed us to perform transformations at different time scales. We discussed two decomposition methods:
wavelets and PCA; but we did not explore the more robust family of
transformations known as ICA. Particularly for the case of music, where
nonlinear relationships frequently exist, PCA offers a limited solution to
the problem of obtaining non-redundant decompositions. Thus, future
iterations of this work will explore more robust decomposition methods.
Related to this problem is that of stream segregation discussed below.
6.2.1
System's Strengths, Limitations and Possible Refinements
In our interest to be as unbiased as possible towards music, we have tried
to make our approach to music structure modeling very general. In doing so we have payed the price of generality. Many of the sequences that
we have generated are rather coarse approximations that frequently miss
some perceptually important characteristics. As we stated in Chapter 1,
it is really not possible to model all kinds of music with a single approach, so while the approach presented here may be a reasonable point
of departure for any musical piece, the particularities of each work can
only be faithfully captured with more specific and more refined methods.
What refinements could we implemented that would still be applicable to
a broad variety of music? We presented methods of decomposing musical
signals hierarchically, making it possible to transform the details or the
large scale approximations of the dynamics of music independently. But
we did not discuss or attempt to extract repeating patterns such as
"themes" or motives that could be treated independently form other
musical materials. Thus, a further refinement of the system would be to
try to find these kinds of structures, and to segment the music in more
detail. Another more ambitious task would be to segregate streams that
may be running in parallel in a single musical component such as pitch.
Musical multiplexing is quite common in monophonic pieces, and the
separation of perceivable streams or voices would allow greater control
and refinement of the possible transformations applied to music.
While we briefly mentioned the possibility of modeling the dynamic properties of sieves, in all the examples presented and experiments performed
we modeled the sieves as a stationary structure spanning the entirety of
the piece(s). A more refined yet still general modeling approach would
consider sieves not just as stationary "outside-time" structures, but as
dynamic ones as well.
Experiments and Conclusions
As can be seen and heard, the approach to music structure modeling
we have presented is quite good at capturing the large scale dynamics
of music and generating novel sequences at these large temporal scales.
But due to the filtering made by the local linear models, we have had
to sacrifice some of the high energy dynamics which are so clearly perceivable and thus fundamental, frequently making the short time-scale
sequences sound unrelated to the training piece. 1 Thus, we have a reverse
tradeoff to the one typically found in Markov models and in some musicgenerative applications of neural networks [29]. A possible solution to
this problem might be a more in-depth use of hierarchical decomposition
of the musical sequences presented in Chapter 4. Because the high frequency content is removed from the large scale dynamics (e.g. wavelet
approximations), we could safely use many neighbors in the state space
interpolation without causing any additional filtering. Meanwhile, the
state space reconstruction of the details would be interpolated with very
few neighbors to avoid filtering high frequencies in the manner we have
discussed in the previous chapter. By combining these two models it
might be possible to obtain good small scale variations as well as large
scale ones.
'Remember that, in order to reduce the filtering, we typically had to reduce the
number of points in the velocity estimates or increase the exponent p in Equation 3.8.
6.2 Conclusions and Future Work
93
94
Experiments and Conclusions
Appendix A
Notation
x
x
s[n]
s [n]
s(t)
s(t)
Scalar variable.
Vector.
Discrete scalar signal.
Discrete multi-dimensional signal.
Continuous scalar signal.
Continuous multi-dimensional signal.
(a, b)
abT
Inner product of a and b.
Outer product of a and b, where a and b are vertical vectors.
||x -y
=
C (M)
=
V/(Xi - y1)2 + ... + (m -ym) 2
M!2"2
C2
Two times continuous differentiable function.
A
V
Logical AND.
Logical OR.
E[x]
The expected value of x.
Notation
Appendix B
Pianorolls
B.1
Etude 4
G7#
D7#
A6# -w
114,1
F6
so
C6
G5
-
.
~
,
'~" !U!!!!H''"
;
E4
B3
F3#
c3#
-il
j~f,
m|HN
I
"
T
ol IsA P 1111!11IIIIIIIIIIII
""""
-.......
4Fn'
Is
I
11
'F ll
lil
so
lll*
111
-1
. ..
.....
is.'e!i'.rl"l
..
....
*
.'.
111
II
a
11'112
G2#
D2#
I
s "
D5
.c A4
it
IIIIS''I'to''''C'
"~
"
'*
-. I
I
.
'j. N6:'
A1#
U~4l
F1
C1
300
400
500
600
Time in beats
Figure B-1: Ligeti's Etude 4, Book1.
700
800
9
B.2
Etude 6
B.2 Etude 6
99
C8)
G7
E6
B5
-
D4#
II
G4# %.\.-
'P-
.
-v
A
-A
G2D2 -
0
"
50
-
a
100
150
200
250
300
Time in beats
Figure B-2: Ligeti's Etude 6, Book1.
350
400
450
500
B.3
Interpolation: Etude 4
Interpolation: Etude
B.3
4
Etude 4
B.3 Interpolation:
101
101
dim=2, nn=2, p=16
dim=2, nn=2, p=2
dim=2, nn=2, p=64
FS#
-
G6
Ce
.
S.
G6
AS#
-
-
if,
1HNNN~i
N
-
*8
A5
H
H
G5
E5
.
N I
B4
-
-
-
CLC"i
e
G3# -.
c2
.11-
.
ae0a
-Ae
A2#
.
.4 .
.
F2
100
.
.
A3
!
F30.
E3
-
C3M
-
I
s
.
.N
.
.
02#
50
1
Timeinbeats
dim=2, nn=2, p=128
GU -
-7:--ei'n-
G2# -
C#0
150
y
E4
.
.
F2#
50
;S,
D
B2 -
G31
.
.. .....
G4 --
..-
50
a
1
10
Timeinbeats
Time in
dim=2, nn=3, p=2
dim=2, nn=-3,
beats
p=16
D7# -
D.1
.1-:.
F6 --
A5#
n
a
F6
Y
G4
-49
E
* /
1 1 ,
,
I,
G5
E4
w
J
E4
.
B3eib-
-
.
.
A4
i e
-
F3#
F3M
F2# C2#
.'.
ets
A3Tie
E3
B2
-,
e
a
-
C31
.
G1# -
G2# --
D1# -
D2# -
AD#
0
0
%
DM
f
AI#
20
4()
60
w0
100
120
140
M&
Timein beats
S.
.
dim=2, nnm3, p=64
20)
a.
40
0
80
Time in
beats
dim=2, nn=-3,
120
100
P
0
A1#
14()
501015
Timeinbeats
dim=2, nn=4, p=2
p=128
GSM--
G6#
A50--
AS#
F5
-
C5
G4
E3
A3
a
82
D43
E3
813
F2#
8
-
-ff
F2# -
G3#
G1# --
G1# -
D1# -
0
-
F3
20
40
60
Time in
w
beats
100
120
0
204w0
Time in
beats
8
0
2
0
20
40
w0 80
Time in
beats
100
120
14
10
Figure B-3: Selection of generated sequences for different model parameters. The training piece is Ligeti's Etude
no.4 Book1.
w
LiJ
dbe=2,nra=4,p=16
AMl
S,
FS -
r
IN
et
0
ato
..
o
.
,
N
No
I
A4
N
I-
5
of I.. I. No
No
E4
a-.
0
FM
mo
G2#
M
D2
4
20
0
10
4
2
Time inbeats
0M
dim=4, nn-2, p-2
dim=4, nn-2, p-16
dimn-4, nn-2, p-64
D7
GS
DB
AS-
A7
E7 S
:
FB
G
m
N
6
bs
G5#
DSJF
F4C4G.
D3
A2
F~S.
inbeats
E6m
.
80
40
2I
1
1.
140
2
1
d
E2O-"
GO#
Tm
id-
1.a
F
ES
D
AV1.-
I
..
"
G2
=i
."
I
DO#
's
1 *S'
0
100
200
300
400
500
00E)
700
Time in
--
--
-
beats
dim=4, nn=4, p=2
-of
-
I
.
q
.F
D7
Ell
D2r
0
20
M
w w
Time in
beats
40
1
120
..
.
S
0,
20
o
.V
.uII
.1
Aas
10
40
Ti
1E
a
bt
Timeinbeats
Timein beats
00
0010
dim=4, nn=4, p=128
dim=4, nn=4, p=64
D-'#
F60
dim=4, nnw=6,
p=2
-
ii,
GS
n
!6'a
05
h
.*"v
s"
"4
i
E4
83
F3#
G2#D2#-
AFS
op
10
50IS
Time in
Timein beats
Timein beats
dim=4, nn=6, p.64
dim=4, nn=6, p=128
beats
dAm=4, nn=6, p=16
C7#
-
D7#-
-
G7
I',
t.
D7
A6
Ca'
A5#-
Vt
E6
-
--
.
G4
;
-"-
!
est
"'ii
-
F5#
"
-.
AS
A3
5-'
"..
-.
G5 -
--
E44
D4#
F2#
F2#-
F3#
F3
A
.% -.
SG49
-h
.I.
E3-
'-I
. -
c3
-
Gif
G2# -
G2
D2
DI# -050
10
Time in
Timeinbeats
150
A1l
SO
4w
aw
Timein beats
dim=4, nn=8, p=64
dim=4, nn=8, p.=16
dim=4, nn=8, p=2
20
beats
G7
D7 -
I
F7 -
A60
C7 -
F6
B3S
AAS-
*6
-6-% '
G3O
-
64pi
G5
Pa
-of
-
D2#
B4
Timeinbeats
-
A14
D4# A2#
F2
0 20
40
Time in
60
beats
SO 100
120
0
2O
4weo
Time in
6
100
beats
120
Time in
beats
CA4
c+
dim=4, nn=s, p=128
dlm=6, nra2, p=2
D7
CDA
dist6,
nn=2 p=16
0
G7
08
//
,
.
E6
.
*/
-//F
Pr
A-
--.
Z&O0
7/:6
..
F3 -
.
.
-111.
as
F31-$F
F3
G2G2
D2
-
40
-
60
-
Timeinbeats
0
120
- 100
D7#'
-
'
3
~~
.
84
p.128
dim=6, nd,2, p64n-2,
E4E
dtm6 niti, =16dtm=,8855
dm
n 5 p=165
p=2
=64dim=,
nn2,
Duf
-
I,
G
9u
Jp
0
-.
*eF
-
FtAB
De
c
20
32
at
82
10
.
20
0
Timein beats
G2#6
d m6 nn2 p=2
....
03
q
03
F61 F
B3
020
-
20
40
JPe
-
,
- .
6020
0
lTimein beatsTienbat
dim=, nr=5,
=16dim=,
nr=5,
40
DS
2022
1 0 120
aA
E4
214'
GS 0
.
20
40
0
Timein
83
-F3-.
-
.
0
E5
A0
100
.
0
beats
Time in
-C
"/
"0"3
-ED4FfP
$,
c
"
-Fa
40
2-
FU
827A
A6#
-
-CF2G8
-2
A5#
G5#
#
Al#~~
20
40
c2Al[
S0 80
10
120
beats
B4
0
-
-
-
D6S
E4~F
B3~~~G
-
A5
A4
0
aa
C7
/l
.
p=128
-
I3
03
F3
G
c62'
1202
=64dim=-6,
nn5,
F4
F6~G
-
100a
-
,a,/5D3..:
0
20
40
OD
Tim i betsTimein beats
80
100
120
20
4
0
TimeIn beats
0
10
6
0
10
%2
At0
f
6
dim=6, nn=8, p=1
dim-6, nn=8, p=2
F7
C7
%
G6
D6
e
E5
%,
dim=6, nn=8, p=64
a'
.- %
*
s
/
B4
F4# -
-
-
IO
r
0
0
B40;600 1000
1200 140
1600
1800 200(
Timeinbeats
G7M--
E3
W1
dim=6, nn=11, p=2
-
dim=6, nn=11, p=16
EB -GM
B7
-
-
8
D6#
A#
F7# -C7#
A7--
G6t -
A2
-
--
Ft
7%1
Timein beats
E-
-?
a
C5
D60 -
dim6S n= 1 p128
D4
%V
Y
t
:.-1,/
F5#-o
A4#
G4
e
pt.
C5-s
a
-
86,
A3
e
inbet
E3me
7
B2
,
A2
F2#
A3
F1#
F16
-
-
--
-
-
0
4
Timein beats
20
-
,31
40
0
Timeinbeats
50
dim-6, nn=11, p=128
G5#
1010
Time in
beats
dim=8, nn=-2,
p=2
D5
A7
E7 -
B6&e6
A4
C64
G5#
-
F0
F6
-
.r''./
C'..
.
A2.
be
G3 -e
D3
A2
A4
E4
a
20
a i
es
a
e
a
j
Tme
in
b
as
a
E2
E2 -
F1#
ef
a
C1#
GM
0
0
w
40
60
Time in
w
beats$
10w
0
20
40
80
80
Time in
beats
100120
140
o om
d
sivequl ouwU
Oa09
001
DO
DP
sleoq II 9901g
Oz
lea
AC
'oI
'
Io I
-
a
~p
.-.**~
.
-6 ,.P
mwA~
~ e - -j
ola
IIr
-
I5
5a
.&
.
a
Fg .-.
..
.
.
*.
1..
-
9
0"
e
Mtr
.1
--
le
5.
-0s9
.
%A::::::::
N
xl-
-6*
spe9q uwem
9t'
d '-.=u
'.2
ip
- st
or
5Lo
ol
- -
"oe
000
-'sa
919909
9-WI
900L0
9W
I u '0u9
At
ooo
-
uIs
pseuun
1
e.
ona
me W
-
-o
0
.1
ik
;P?
00
.'P1
v
sla IIIow0
W
09
..
i
~.;it
00,
00
A
.K#
Bnod 'g99u '9=u99p
"a
- -
a..4
.
-
Mo
~cc
'
~
p
;9sa
:*
'~-
0
z
~*
~ ~
5
-..
ga
s/
,
9
;
*.ra
.qq
ls L
*
Aa40
50
Mt~
MI*99e
4*Ysw
=d '9=uu 9=9mp
M;5g
MA
0
aaIa
a3
r.. '
.
///p
--
fld'EU
~0
-
-
OM
"
.-
'=o
go'%
.
9g-d '19-u '9999059
I.
.*
.
.
. .
Ig~
.
-"
.
"il
"
-
-
II,
'muu'eum
dim=8, nn=10, p=64
dim=8, nn=14, p=2
c73
A6# -F6
C7Ge
jiiiii
r
o
%%t.3Nd3
-
G5
F ..-
Ga
F4n -.
.- =m
G4
in ba
E4
". O~d
"':,wN
I
V..
F5
A4
/e'
-
V"
D4 .
G1
01%
I"
A3 -
DS
E3
B3#
--
G2# -
A#
C0#
jj/1*
2D
.,
w0
40
Time in
dim=8,
8
0
14
2 00
20
'Y 8I
1p
%v.
-P
.K I010
40%-
D-Va
.N.
es
a
'%
I
40I0t010
Timeinbeats
beats
Timeinbeats
so
fillioll
dim=, nn=14, p=128
dim=8, nn=14, p=64
p16
nni=14,
20
W
F20
D2AD
0
.N,1'IV
T
.du
F
I
:
C7r
II
IN
FAM-
C6 -
GIt
A21
-
I
IN
IN
A4
E4
l
INI
-
G5S
IN
t
IIIIED
A59
I
A3
G4 -
Dc .
.
#
IN
I
a
-
j
32-
II
G
T
0
20
40
Time in
s0
beats
Timein beats
:%%v.
INs
F3#
-
pen1,p=2
.N
f
F4#
GF
Illil/ilil
F5-
tii
.
80
t00
120D
0
20
40
as
60
Time in
100
beats
12
Time
inbet
a
1
B.4
Combination of Etudes 4 and 6, Method 1
B.4 Combination of Etudes 4 and 6, Method 1
109
din-2, nn-2, p-2
dim=2,nn-2, p=64
dim=2, nn-2, p=16
-
D7*
C7t
-
D70-
AS,F6 -
Det
:o
VA'I
m
@I.in
IIN.
1
1
1
o
N
I
I IN
Is
-
A4
IN
-.
N
.,
p
No ..
n
.
-iISS
t
car
ae
tc
1
2e
im i nbas
dim=2, nn=3,. p=
3-
D
a
N I
Ls...
Time in
beats
.
IN N
s
I I
N
-
I
F30-
o
in dimn-2,
I
n=,
i
ipU29
-o
IN
N
I
At
~~ ~ ~ ~
10
5
15
dma
.
sI
e
nn4 p=2 '
IF
2
25
Tim in beats
:I
.
30
35
,1
I
4
dim=2, nn=4, p-26
,
C3'
1
dim=2,nn=3,p.128
G-
n o2 p.16
:Not
0
-
F4
o./'
INI
:' s
:.w
I.'s
I
N IlvN.
d
A6sI
ON-IS
EN
N
2
N s I O
dim-a, nn=3,
nn-S,p=648
dim=2,
p=128
-
No
331
25 30 35 4
IIo,4,
NI M.
-
-E3|4
-
a
dIa nn-3
10
I ~
323.
5
ON.
E
..
o
C6 .
a3
,o
F5
I
-o
n
s
G4 F.
dimA-2,1
..
*
ON
:"
.:i,
eI.
A7#
-
A6
I.
F6
N
.I-N
E3 -6
cT..
I
- N
.
p
FS
5s
t -
15
2
mN
3
3.
.
car
15
10
5
10
20
25
e
oas-i
0
.-
N
I
..
I I i
iq
D3I
4o
35
3 0
II
-
I
-N,
-E
---
-
CTime
D
:II
111tiie
5o
D1
-
a
I
E3
E43
4
0
15
W
5
Tim in beats
4s
inbeats
3 y-
.
-
.
.
.
.7
l.
Figure B-4: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 1, with 20% for
Etude 4 and 80% for Etude 6. Each panel is a combination of the Etudes using different parameter values.
din-2, nn-4, p.16
dlm.2, nn.4, p.128
dim-2, nn-4, p.64
D7#-
D7
'A
-
-
-
10
15-
M
2N2O0
35
4
,...-
NO
----
F3B
5
**
-
.
F4
D1
-.
5
r
.
I
.
No
.
-6
F3
s
F6,
C
-
e.,
-,IN
AS.r
0
5
1.
15
20
25
30
Tim in beats
35
4
45
:
-
I
Noi n beats
dim-4, nn-2, p.2
01 10
2
40I
Ta.
c
21%
F6-_1
r -..-
02 -
1N:I
J1/3
.
NI
-i
E2
'IF
imeinb'
".
1
Timeinbeats
i l
F3
r
",r..
ddimn-4,
oM-,nn=4,
nn=, p-164
p.64
4:4
AAW
E4 -.
.-
n.
.
y
i.
r .1
. .s.
,...P
dlm.4, nn.4, p.2
am
1®
Fl -
1 ,s
0
is
E4
-
-
-
Timeo
in boas
03
10
-
20
..
-
.
-
30
Tim in
./
,.-
40
beats
a
-
p
5
..
dim-4, nn=-, p-2
din-4, nn.4, p-64
FS
F4
m,
..
.
D
MM -
-
dI
n-,p.2
AS,4 E4E
4%-
F. -
inbeats-
5i
1.
S
20
E4
3540s
Tim in beats
dim-4, nn-,
-
i
dim.4, nn=S,p-64
p-16
F6
-
ce-
-
a.
.
9
4
-
.,
,,
S
y
.
E3
*--6
-
.
.
..-
.m
-
,-
S,.
-V,
-
."
F.'..
.
'.
,...
E49
.
.
AO.
.
is-
..
-
-
-
a I
B
.c
...-
,
-
.
*
F3
0
.
1o
-'i
an
4
2w
C1O
w
0
Tim in beats
E3
-
dim-4, nn-S, p.2
10
50
2
Tim in beats
dinm-4,
nn=B,
p=64
dimn-4,nn-4, p-16
F6 --
F3# C2*#
"I
-
-
:.:-.
A
-A"-.
e-A-e.
.
aa
9v.
.
.
-
./*
*
Y -*9
w
-
..
-
O
10
20
-'
30
Tim in beats
40
.
5
Tim in beats
w0
.
*
s1
.
I
a-3NA'
1
L
BACobnto
t
.
.P
p
U..,..
S
2
OOFI-
P
.
l
.
P% M.2
-*
t22
q
f
21
.~. .~4
.2
tds4ad6,Mto
4Dd
.'.
~~.J1
..
~
ts
**U.
'1
~J
e
IL
34
.
Fl2
*
A.
l
.
ZL
R.P
.
h8
. .
.
I
1
dim-6, nn-B, p-2
F5
..
.
%X
1m
F5 in
din=6, nn-B, p-16
a.
A5r t
F5
-
E
D4
ii j'.:
"N
. i i sk- il
As
C N
-
V.
-.
c5
.I
iSInnEi
.
..
.
1
dim.6, nn=, p.64
'
4111
.
F.r I
..
5,.
..:.tiC-
Y"E
.
F4
:-
.*.-.
6
5
.0
0
..
M.
3
2
il
- -
al(a
i
Is
Gd4,B
In
W
Pit15 2
5
0
3
4
sm
.
0
F5
0
5
4
5
5
A,.,
E2a
11
5
.
*
.
00
50
.5
ON
-7
-i
%
F2
I
S
. .dim-B nn1
.p=2
A"
7M
sm
-p
5
10
15
20
25
0
Tim in beats
3
4
110
45
is
.m Tm
a
inbeats
*1 II-BII
dim-6 nn=11, p.64
dim=6, nn=11, p-128
di m-8,
nn=2,
..
-
p.2
Dalg
E5 --
GMa
D7#F5S
-S
-I
.
FZ6
G5
DS .
No
shm
i.
~'.y
:mrf
*OAN,
-
.,-
-
.
.11
--
0
I
E3WE
.
.1..a
G2
i.tg
s
g
E4
ca%?. '
"F.%
e
"
lisi
G1
.
si is.
.-
i
Q~~~ri
.
E3
0
2
0
40
5
-I.I
-
5
D1s
0
7
0
0
15
2
I 25 30 354.4
Tim
inbets
Time"
0
10
20
3
0
Tim in beats
50
0
l
F5h
-
F21
40)
Tim in beats
a
-
J
30
u
Tim in beats
.t
20
'
dim-B, nn=11,pl16
-
par-
e
p-2i%1
-
DA
N.
ON
* -
.-
-
B2
"
.
\
G1t -
C7s
4---
W
-isesu
F0
-
inbes
-m
NOE'B N OW
c
-
-
-
-
.
-6
-
-I
-
-
-
,
-.
tPPd
'.
OOd
*
-
---.-
.
-
-3
-
-
-6
--
-
4
-
-
-
A.K
-
882
.......
-
-
. -.
~.
-. EC
2-*.-.
.~~~
C
'L
".-
JSJg,
- .
52
t
........
Nt
4/
E)NI??N
*
- -
-
*..
-..-
..
2F
F
8
404d
..
,
%.a.
/2
~fiiff
aaas
WLa.
.....9
...
:..
F8
2-.--.-..
B.4 Combination of Etudes 4 and 6, Method 1
-
-
- .
te
..-
*
j
-
.
- -
-.
2
'621
*.
::
-"
-
2
2
.-. -. -
N-
r-
-9.
fii
* AtL' '''
-
,3n.
EL...
-a%
B2 1fi
.......
-
-s
115
dim=8, nn.=10,p.128
dim=.8, nn.10, p.64
D7B-o
E5-
t
.I
F6 --
*1,
CSe
G5""
toy
,..a
...
E4
v ,i
Ea
01
E6i
.,.%
0.
w*so
lS.%
-
A
g
W
a,.
di3,-S,
nn-14, p-i,*
*g
dl.,,=B, nn.14, p-64
Al
-*-
2I
4
C7#
0
10
20
3
0
0
10
-
F5b
Gen-
-
as,
-
01
.2t
-
Ca
.r
-
M07
50
Tim in beats
C76
D5
SO
3w
2M
*2
tdim-8, nn=4, p-16
Co.
D2
Is
5
w 0
50
Tim in beats
m.
%
8/
.
.
a..
-A
.am
to..",
.
.
.
.
c
..
r- e
-
Mo.
.
..
--
E4
,
-
-.
Def."=el
"
'*
.a
....
.-
f.
o
".s
,
-..
--
a
1
.
"6
1
t-
-
'a
dim.10,
nn=2,
p.6
.
As
F,
E9~
N
GO
0
10
20
to
-.
t.
'a'.
D.
,
.1
&
3M
0
Time in
P.3
5
beats
0
7
t-
o
dim-10, nn=7, p-16
dim-10, nn.2, p.128
c5, -as.
o
.. .
-'
.
-
s
..
......
*2
-
aP.
.
-
-
.6
F0
Yt. W
r3
.-
."
-
-
..
.-
..
c3
.
a
.
6
. .4*
.
.
*
..
I
-ia
.*
.2
a.
-
.
e
.. ,a ."
~
e-
.
.
s
05 .
.-
A
oF
or
a
i
.
.
0
%1,
~
~
.
~
"c
.s
..
A
"".L.-.am,:
A
d~~m.10,nn.7, p.64
7, p.124
1dim.10,
slo
dim-10, nn-7,
nn, p-128
i-0n-I
p2
O..
A..
ES5
so1
-
IE~F
0
10
2w
s
S41.
30
c . ,.r
Gt
M
so
o"
"G.
0o
F..
F2
J..
no
0%
.-
.D
.I
.
-s
Im
-4o2.
10 I
D.
.
.
.F3a
El
c2
a
3
F3s-
-m
DFI
-
(0
Tim inbeats
ri nbasTiminbeats
dim-10, nn=12, p-16
dim=10, nn.12, p "6
dim-10, nn=12, p-128
..
3
.
!6*
.P
a..., s s.
4--
...
1.
.
i.
-
s
s -6
a1
-;
F1,
10
20
3
Tim in beats
Q
4ws
o
10
2
4w
Tim in beats
5w
0
10
2
0
Time in
4
beads
0
d
dim=10, nn.17, p.16
dim=10,nn.17, p=64
Es
C7
s,
. * '
-I
.
-"""
...
.
..
?-
.
a
.
.,,
....
-
-
.
-w
E5
.
..
e,
.
'':. .
.
*,.
F....
**
...
a
-
"
F2#
Al
G2
*
-a,
E1lG1-
o
-
Tim in beats
|-im inbeats
dIm=10,nn.17, p=128
i*
>s
.
."b
.... A
.
-
.
0
P..
%.t.
-.
.
E5*t
en '
-
-
I
-
-.
*
*.
.
-
-
B.5
Combination of Etudes 4 and 6, Method 2
B.5 Combination of Etudes 4 and 6, Method 2
119
dim=2,nn-2, p=2
dim-2, nn=2, p=64
D7
As
E6
-
....
61
* '..
I,
E4
*
v
1
-
A1l
0
5
10
Tim in beats
28
15
20
25
Timle
30
in
354M
beats
5
S
2
din-2, nn-2, p-1
dim=2, nn=3,p=
dIm=2,nn=3, p=16
F6
C7
-
a,
1-7
2
3
4
I.5
"-- *
.............
"""'"'"""""""""""''""""""""N.H..H.H
C-.........
a
AlA
s'a
Far
~:VI'sjj1flp*~r
C7
02
j4,
dim=2,*nn
3
-6 .
F6 -
".
-.
.-
. .
..
.,
.
*-
,.
I
-
o
2 10
30
ao
so
.6!
A3l
Timne
in beats
a -'*-.
.
.
-
.9d-
-
.
Timle
inbeats
E4
-
so
,pj4
a-
-
a
;.
.
""
.
%.-- -
--
"--j.l-
'
E1
Figure B-5: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 2. Each panel is
a combination of the Etudes using different parameter values.
LPd
'01P1
WiPd
&
.%I
-C-
d
LPd
-
*A
43Pd
-
-
r
.
B.5 Combination of Etudes 4 and 6, Method 2
'
LPd
Wid
121
dim=4, nn-4, p-128
dim.4, nn.4, p-64
-
-
-
U
E6
I-
'-
I
E4 -
dim-4, nn=6, p-2
07-
* N-
o
I-
s.
s-b
10
S
20
o
40
5
go
r
Fe.
Tim in beats
dim-4, nn-,
dim.4, nn=6, p.16
:
Gol
p-4
s.
s..m;
I
i a
:1
1 nn-6
."* pI
dm-4,
:;10
.
L !.
o
30
I I.
I
M
s
F1
dim-4, nn=8, p=124
n.-6 p-i
E4n4
r7
-
..
.
-
E4
r~
A-
It
4
w
Tim in beats
2x
-.
*
-
. -
-
;s
.
F1
S.
s
inbeats
Tim
lol
yIs
so
4
--
70
O
..
30:w
2±
,
-
Vto
I
w0
7
dim-4, nn-U, pe121
dim-S,
dim-6, nn-2, p.128
1jlame
-
I0
p
l-www
ao
beats
dim-6, nn-5, p-64
I
I
I
mlfI
I
"
1
1
FO
m
=
-3
m
o
4jl
10
-
w
M
w
Timain beats
ps ,.
ml,
1'
M X..
o/
O .
F,
nI 6omm
ms.rn
ao
TimneIn
/
an
mc ~
as.
/~'
p.2
,a1
a
*.
9I
_0 10
.
*4- :
Mo
1*'
:of.
%.;f~cm
/
nn-5,
Ut
as
1
t
Pon*
2
din-,
-
7
E7
I
dim-6,nn-8, p-4
dim-6, nn-S,p-16
nn-B, p-
Is,,
i..
B-
Dar-
-
'
asas
ma
F4
vE
E4
--.
yy&
1
mt
cw
a;
IN,
F. -11-"
a
E2#-
I-
car
Ifili
BII
-
Tirn inbeats
A,
dim=1,nn=l, p.12 -
C4
dim-6, nn-11, p-16
D3
C7
Es
U
-
-.
s
-i
-%
I1F
din=, nn=1, pm2
m
06
-c'-
A1
F11
rr
1
2
8
o3
I
N4
o
o
o
Timni bat
7
*im Unbet
rn
-
-----Fm Tim inbeats
Tim in beats
dim.6, nn=11, p.-64
Fa
dim=.6,nn=11, p=128
-
dimna,nn=2, p-2
D5
-1
--
F4
di.,nn2
~
E.-~1
0
0
~
20
.
Tim in
ll
1
30
Tiei ats
beats
I
-
4
0
0
E2-
o
go
ct
Tim in beats
10
ins
3
4
a
I Al.
I-
-
Heat-pil
lil
-asa-
..
0
-
-
-
-
.
fSill -
.'-
----
Is
Wirtl.F!
I
isi
'r'nup
pd '0L"uu
9L-d 'ot-uu 'i-'wp
*
I
a.
:
s
e
r
uten
sm
*1d
1.. 1:..
...-
pji I
: ':Its
I
o
I
Isu!
'1-16P5
'L-UU
,
-
*
--
-I
I
m-
";I
161
I
;;-d 's-u. '8--Ip
l
a
e-d '-uu 'g--p
*
--
r
Sieeq
U1
ema
Isi
.6-d
-
'E
,
's
iI
;;
I.,t...46
IV'
a.;i!1iUIu~,
;.
w
-U
;
.".
.
/s.
:
7
jpLL
r
-
.
I'--
1'g1-sw'll
'z-u
ML-d
I
I
*b
rL."
BELid'vuu '5-wip
*
~
I~ I-.
:
~
11o
.
n..
NEW.ii
S
IHS
A *l
I
::
*I
N
pil
I
1 --
NI
*ua
jjl
9
I
o
s
N
o
o
6
o
-I-
spe uti m
aLd '-uu
z-d z-uu 'oi-p
'at-uip
,..-
-_-.
--:: ---o,---y u'
md '
- -
Am
~ En.e.U.U1U:--A:&v:E.
;|-o
*5no
*1100
.
-
*
zh.~oD
r
*
rfl6:.:
yg-d
%
sleeq
ewg
u!
-.
'yL-uu
spesq
-
'gLmjp
a
jI J
euLL
u!
"
!
*
*
if
4r
r0
:1 11
o.N
.f.'N
in-V.,.
Ut-U
~
j,;
pg-d' L=UU
'9=WIP
-d 'PL~uu'-wIp
9
~U:U
a
as
.
OLA09
0Z
p-u
-~
-
ol
iiu11I!
;%16
f dOL~
I/
0C p
m1
Zw
O
--i
pg-%?
oofalm
z
dimm10,
nn-7, p.16
dim.10, nn-7, p-2
dim.10, nn-2, p.129
-i
-
-
F510
II
E5S
II
1
07
-
Tim inlbeat
INrCNi'?
F4
E
-
..
*-ol
a
10'20
1aa
cuy
I
so
a
-
Tim, in beats
.
dim-10, nn=7, p.64
.
dim-10,
U . nn7, p.126
. .;
.
dim-ia, nn=12,p-2
84i
Coo
.4f- o
I
-.
1S - M :I
as.16Ndim-la
IN12
No
F
1UX
-
jf
'/ff,
-j/-
1
I,
l
so 2111
011
1
nn7 .6
m%5. 0fA
Jl
N
No
l
-
*
F3 -
:
-
r
10
im-la
ka1.pS
Tim inbeats
dim-10, nn-12, p-128
E4
Ww
,
Tie
.. inbet
,
F41
-
:
-
y -
--
-
D3W
0
E3
-
la
-
id
10
20
3w
Tim in
4
beats
G
in- Mi :Ma
s5o
I
l
.I
..
.t
-qoL
;-
--
o
10
2
wao
4
Tim in beats
.o
--.
..
116
sleeq U!
Il
El~~~
()411
I
BZL~d
'LLUU
I
I
I
I
IIIi
'ztuu
0~*E
O'Imp
I
Ld
1
4..ra
:rm
BILd 'L-uu 'O"'IP
K6;.
ANIU.
gj-d 'zt-uu
Z-d 'LL-uu 'OIWlP
'0L-mIP
B.6
Combination of Etudes 4 and 6, Method 3 with
Components Modeled Jointly
B.6 Combination of Etudes 4 and 6, Method 3 with Components Modeled 129
Jointly
dmm.2nn=Z,p-4
dim-2, nn=2, p-2
dim-=2,nn=2, p=S
I
AI
as'
as
.
m :
ass=2
jiu8m
F
4
-&Ao
0p.64 i
ts=2
as
-
-~
km.
mm
*as
ask
tZi~ii
dim1, nv=, p
ff
-
aa
I
10
Tim",
inbeats
40
I .
:4, t
50
Ti i
p=24
;W
beatmm
dlmm2Zms=3,p-64
dIm-2, 55-32, p-14
#a m
adim=2,
nn=3I,
m
S
im
Tmime inmeat
''
F5
74 mm
-~~~~~"
3m mm -&
DG2 -
G4
D2
a
Is~~~r. fid
m
m
"dd~mvmmm
' m
r
La.. '*
m
m~
-Z
m'
F6-..
so
Tim inbe
ts
o
'
as'
-
-
.
0
10
2x
30
a
Timin beats
Figure B-6: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 3 with parameters
modeled jointly in a single state-space. Each panel is a combination of the Etudes using different parameter values.
diim--a,
ne-3,
dIm-a, nn-4, p-2
p-Sd
som
P e
.;.n
p-4
us
e
*
-
nn-4,
dim-a,
As
M
:Ka
I
No
ti
I'3
..
*
1111
&
7
A 1V6.
*
seji.
IN II
of,4
dim-2
p-ed
.1
N .I
.1
.'5
IN.
I
A-
I
illeW
In IdA*
:;a
r3
I83
1
0
IN
1
*.
A
hS%
ONNi..m
3
-I.
1
l s
on
DW-
.k
*ass.
All;
w
*0
or
M P.3,
V
.
CN
i~
Sqt
InIs
Y
'5
j~
~
.dIto1
dlm-2, nn.2, p-14
istV
t
::.. ~
A
MM
a,
a
A, oU -
~
*
% V.
-a0s:.
A1 &Jn..
%
A
*
~
10
2w
1inbeats
Time
-
Uninberats
dkv2nVp6
*
Ss
0
f
Us
o"
At,~
dIm.,
dim-4, nn=2, p.16
dim-4, nn=4, p-2
c& INNoNom
No..R
-O
F-F6
p.64
n.-,
19~~m
w
fhr- v r
---
O n oI r
-o.
-
Ba -
-
-
o
S.
E4
-o
15
20
30
40
dIm-4, nn=, p-4
1
15
No
2
5
dim6.4,nn.4, p-16
dim-4, nn=4, p.8
AF
30
A
3
-
-N
;
I
I
0
It
-
So
.12
aM
5
'I
1S
-
F6
-
.
as
--
E4 p
..
%A
0
10
15
2
5
3
5
7%.
I
..
N
I
I.IIN .6'i
n1
-
.
16
20
31
Time in
'
4
5
0
2
2
15
4
07
50
25
Timea
inbeats
N
0
inibeats
10
53
-A---f
EO 1
II
ON1IVA
NmATime
-
Ut!'r
o
aM
Noo
-
beats
1O ONN
150
4
rN
0
ila
;A
-
,
--
-s
ION
a.-N
-
-
usNO
-
- N
F4
FM
Tim in
beats
o
II '
.
--
l
I
dim-4, nn.6, p.4
D5 1t0 20Y 30
~ ~ ~
E4
4x
MnTm
4050A
bea.0ts
0
o
i
dIm=4,nn-,
dim-4, nn=6,p-8
dim-4, nn=6, p.64
p-16
III
'h
a
"
-
d
E
II
0
a,
..
F.
G1a-
-
.
a
~;m
ssm
-r
o
olo
m
imrm~
-
-
o
-
111 0
P U
~ ~
U
:.M.~~ fi8NI
'M
a
.I'
A
a
-u
%0 0
i
2
r:
.U
.-.
F21
S.3
D1r
-
0
-
5
w
0
5
Tim inbeats
s
10
Time n bets
7-
i
B3
..
aar
D7
o
BUS
am16
m
0 al
s
it
-Tims
aInbeats
dim-4, nn-S, p-16
-
ume
Ui
dim-4, nn.6, p.64
05 U
"E.
r.
aU
.m
-..-
UUU
r'--
-DS
J
a
.
U
i
car -
o
beats
I
brim
- -
0
s
aa
SI
"
t
Tie in
ma
.
N
".0
dim-4, nn=-S, p-S
dim.4, nn-S, p.4
G71-
.
*1g*~
a
ss :s
a
s
o
r
-a in
as
Timeinbeats
--
-U-,-
I
-U U
me,
med '=
ME~q
d
*Lmm
g
'~6
ME=~
m
mge
EuMqp
-
NEWgp%
fe-my
I
61
ME 'o. E '
mm
a..V
-at
.1
*
J" us
.
&s~
=d
m/..'".
if* lol.
'rfr* A--% .iL
*IU
m
so
IA
'I'
C...
AF*',td
A
If
*so
I '
*,'*'r'(
9
M-rM.P0I.
-'RU
-d 'E"u B--np
*L9d'g-uu gwp
%A-dw
'
'.- I
I
ON
o
a,~ a! a
m
0
In
ma
Smm,
-S
0
I
a
m
m
ImaI
on %Io
-'1
mve~~~~o
'mm;,::No
w
a
a N
Am
a tEm
6N
*EIY
* I
*r
e:M
aiN
a Ia oo
aa s
B
onm
ma6j
41
A
Ia
m'anmt
ON
Bookb
E
So
-I
on
Imm
IN .
S U;.
S-
.Ld 'I -au 'a-nIp
'&&-aa'a-nip
.waq LA -4
**
~
an
mama
09mDsga
*
%a~~a
*%~
~
mm
a
-*a
I
afIN
-mU
w
I~
0S
0
"o
aM
0
0
Is
-
oa
SoI*
pa-d aII-ms 'a-nip
am.l a11.1
aaaq
3m3
Noaa!So 0
a a
a ut
N.
r-d 'ttinas 'a-Ip
-
st
4.-
4
11-P
7i
No
a ma
am
m,
a Oa r
af
Im%
a
Ea
*w~
O
a-
'a-ma ' aNi p
No~
am
w
m
-
M
O
l
o0
a -a'aa
l
'-i
Ito
a-d
a-mnoag
Iii
U
I
s
.41 ada
11
*
-
*
I
S
l
I
*
U.0*
-
a
-onn
a:
U
t
I
g-d 'gu
-
s=.
*
avI
r
.-*ap
WN,on
U
a
mat
ME~~~~~
~~
O&i a 1 1
lI-
"
g-uu 'a-wmp
-d 's-uu '-tup
.
1 .1 0j
-
d% N
~
1*ojby.,*
III~
WI
U
.j5
4r*iUc
%
g-d
1
-
I -
-N
on'%'MS
O
IN -9
1
O.
--
a
I*
.
-
.
-1
'|
au
* .6 m
IN
*
I
*
....S-
god
I
-1
~
'asu
li
I
-
*.. NO:
-N -
d'
-
-1
z-uu B-suip
g-d
0
*1
-- 7
a |1
A.I
.
ON
.'
'
-
.s
.
~j
IS
..
p.U-
*
"
t
to~u
- u' '
'A
-
*
*. .'
--v .t ''"
p-d
w
a==,
*--"
.--
*
"
u
-
-U-
g-d'z-uu '-uip
Or=d
LUU 'mup
',
-1
I
*
N
a
is
p-d
UIIE
1
:
EU..
-16
a
a
is
ps~d ot1Uu a Wp
'pL.UU'BW p
1111
EI
-
U ONaIS
m
N
-o
s -
-
a.-
U.
*
U
*
*
'ual
qu.
EEU.
*P
*
*£
""U.
I
*
-i
-'JEu
3-
6-11
m
;
.
?
IIIIII6lf,
olial
I
USE
L
--
' 'u
c
*l
Tm
huIIW
'OL-uu 'emulgp
U'I
I~
.
-U-Il,
mt a
-
LIIII -
S
*
gL-d
*
x1
-
ll
S
1 oi
95%if 9SVIIIIoI
-*
r
l~d'guu
-d
'gUWIp
'oL-uu 'leWip
dim.8, nn=14, p.16
dim.6, nn=14,p-8
so.
.
.
..
E
dimn=, nn=14 p-64
F5 -
E --
!
ES*
F.IN
.
:il.E
-
~ ~"
c
..
-.6:
F4#4
m *
I
a
*
10w
Tim in beats
-
F
i
I
.
o
i
G-
Tnb
*g5
I
w
10
so
5040
2o
Tim in beats
E-
2
dim10, nn=2, p=
AS
o
S
10
15s
2
25
30
Tim in beat.
35
40
45
10
15
2
*2 30
5I
4
B.7
Combination of Etudes 4 and 6, Method 3 with
Components Modeled Independently
B.7 Combination of Etudes 4 and 6, Method 3 with Components Modeled 139
Independently
dim=6, nn=2, p-4
dim=6, nn=2, p-B
Tim in beats
Tim in beats
dimaS, nn=2, p=64
dim-6, nn=2, p-128
dim-6, nn=2, p-16
dim-6, nn=5, p.4
|
-
,
.
~S:6-,"l--.
6
P|
It
-
11
as~
1
---
-.
-
'.
o
il
I1
-
~q'1
~
~
E4
I
F3Il
-1
11
4I
-C
S.,.
.
~
1.
2
.,
II
I
,
1
VE T
Nip Io
I
a
soI
Tim in beats
-
.
5.
I.*
F4
=n5, p-S
dil.B,
dim=6,nn=5, p-16
dlm-6, nn=5,p.64
*
*.
%
5
-
-.
-pr
.
I
Ii
-
n
s
I I
-=
10
20
..Ir.a
:.-.-
..
...
-
-
I
V
nnl
p.6
-
Fel - .. Am IsTy
I 1n 207q3
CalISf-
i
.. Tim in
.J beats
5'
0
0
6IN
S
't5its
D5#
'I
o
i=6
.
'SIt
-P.-jViPS
I
30
Tim in beats
Figure B-7: 500 note fragments from combinations of Ligeti's Etudes 4 and 6 Book 1 using method 3 with components
modeled independently. Each panel is a combination of the Etudes using different parameter values.
43#d
4ONd
B.7 Combination of Etudes 4 and 6, Method 3 with Components Modeled 141
Independently
dim-6, nn.11, p=64
D.-.
dim-6, nn.11, p.128
dim.8, nn=2, p.4
,
A70-
C7--
I
A5
I. -
aa
10
2M
Tim in b
osI.0
-
e
Iss
all
0
w4
3
as
-
A
1
.
-
10
2
w
3
Q
w
4
5
G6O
G3
.
Tim in beats
I"
"sa
dm=B, 88.2, p.64
-n
-
2m
I3
"i-
0 10
2w3
E4 F mM. -
FI.N
Se
owsI
Timie in
Time
.. in
beats
0 0
s
asse ii sis
bas
dim=8, nn=2, p=128
dim-B, nn-6, p-4
dim-, nn.,
E4
p.
a-
t
""I
o
m
I.
s
esi
-B-
m 1
'-wilmwns,
I aa.
0
to
2
-
-
-
1%
s
1
0
E-
Is
I
IN
I
0
a.o
Timeinbeats
'Al
s
7,0
o
10
2M
30
Tim in beats
4
I o I.
dim-, nn.6, p-128
dim-B, nn=6, p-16
is
~.
"
I
Ii
As
-..
...1. -
-A
is
=
sa.
Is
a.'..
l
....
Fu
..
%,
.R
10
..
0
.. I
2
Tirne
dima, nin10, p-4
mu
-I,
~~~ ~
E4
w ,3
IS
1
10
f
1o
.
*
FS
.
-
8
E4s
B
ifI
I
mI II1I1m :
1A
J~,
is e
a
u
ca.Timeinbeats
dim-8, nn=10),
AS
p.64
5-
E
,
.
ES
Al~
Ns.al.
as
ai
..
-
G1
. 1
.
si
8
-
F2
80
in
SO
70
beats
dim-, nn-10, p-16
dim-B, nn-10, p-S
5
~
I~~
o
"~ gi
'. I.
IN
No
I Il-i .
.
-20mill4 11 "1 1
GS
%.
as
m
Timeinbeats
ii
....
sn-In, pa
d
iuim-B,
mess
.
-
a*
EE,.
.l
ES
co
.
55.5
5**
-6"
,'
i
SO
1o
dim-B, nn-14, p-16
or
a
IN
'I.
Is
a e
a
INa
ases
IN'i
,
."
aI
dim-B, nn-14, p.64
n I,
,
M Npse ll
a*
I .
mll
uil
lip I
*
e
a
m gissa
s'-
0
10
2M
o
)
beats
4
Time in
wo
Mo
SO
7
to
so
2s
Tim in beats
7
dim-, nn=14, p.128
dim-10, nn-2, p-B
MI..
.. PM
-*
-6
1
F5
-
a
~ ~
IN
-t.
INIIN
I
D4~~~
Ne !4'i'Ns
is N
I
I
-
E
D4
-
-
-
.INmp
.
Faa
-
Car
allD1
ca3
-
0
EON
1.
0
0
Time in
a
s
o
o
7
0
10
2w
beats
s
m9,m
'IN
O
Tim in beats
dim-10, nn-2, p-16
N
3Ga
dimi10, nn-2, p-64
IN
I.
klf~..
-
.&.9NNNN .l :
.
IN
,..H
, m,
Als
-,
Is
II
,
-
m.
,,I.
INe
3N
*
N'
i
24'7
N.
I-i1N
itn
1#-.
d
nP
IF
p 'I=
',,i
'.I:t
I
IN
.
N
De'
~
w
dim-10, nn=7, p-16
dim-10, nn-7, p-8
dim-10, nn=7, p-4
C7
-7
I
'rd
t
I
AAS
.
...
-
n
,
..
i-
.
"
-
9
%~
. ,9.
I J .*
a
.fal-
C
n.
E5B
e5 -
e
...a-
==
d
.
..
.
%5
--i
..
F3
Tion
.4
so.7
x
10
c4
.. '
os~~Thn
I
-.
~
~
Cal-
.16:
I
.be.a"..'.
A1
-
CS'
El
1,41
dim-I0, nn.7, p.64B
-
-
--
40
Tans in beats
30
2
50
0
70
dim-10, nn=12, p-4
dim.10, nn=7, p.1 28
..
-,
cdim-,"o-12
*
...
.
10
G7M
-
be...s
FAan
bootC
11.6.
.
- I7
Xs
D.
Is
I
..
.Tin.
a
I
e
.
Bil
E5
% -:-aa-Mal
ca
M
.
-
-
-
-..
l,
.
-.--.
B7 -
a as
.
f
10
30
20
-
.
50
40
O
0 10
M
20
30
.
s50 o
4
70
Tim In beats
Tim in beats
e..
I, a.
-U
A1l
-
.as
F4
Ii
.s
Fal
C72
E
-
..
i.
dm- -l., nn1 . p-
.
dim-10,
dim=10, nn=12, p-16
nn=12, p=64
:7#
-a..
A7 -
-5,-10
-n2 "...a.t,
-:74
E7
F5 -
.O .
F5 P.
m
F6U.
. a
DOu GSB0
..
.
..
"
-
.
4.
..
C4
D3
al -.
"
""
0
Tim in beats
s
,...
s
"
10SO
--
-
7
Tim in bead.
6
"
0
0
10
2w
30
SO50 6
dim-10, nn=17, p-4
dim-10,nn=17, p=8
Fa-
G7-
* 1%II
I~
**
1
Ism
*i1
ceaF4
03
I
-o
I
Is
W-F
R
-
0
1.
-
0
T..mi..
4
0
Tim in beats
3
dim-10, nn=17, p.16
6
0 10 2M30 4
70
Tim in beats
dim-10, nn=17,p.64
dim=.10,nn=17, p"128
G7#
C7
uIn-s
.
.
.'..., .
*
A6-
I,.*.
-
F6
e
.asas
ma.e
Is I
c
E5 -
s
Ga4
-
C-
AI3
-
-
Z
F2-a
0
10
20
30
40
Tim in beats
5
60
7
o
10
E5
dim.12, nn=2, p=4
-
.....203w0
..
..
Tim .~in
beats
4
5
0
-
17
-
F1
PV.. nn=2, p.8
dims-12,
F4#
F4
9
05,
Iss
...
.
se
Fi3
m..
.
-
s
3
-M
0
10
20
1
D5 -
DW-
ce....l-.
~
'
vpi9
I
3
40
5
Tim in beats
60
7
-o
-6
-
dim-12, nn.4, p-4
dim.12, nn-2, p-128
'p
F.
-
-\
G3
E7
Fs,-
F4r
-
-i
V.
c
---
E2
A2 -
-
.
"..
F1
.
c1
10
0
Tim
inbeats
50
0
c4
dim-12, nn.,
dim.12, nn-B, p.B
~
~
10
0
II
.
-
D10
.
-o
M
2w
w
5
40
Tirninbeats
3..
II.
-.
.
.o-
.-
.-
i il...
l..
s
03-
p-16
*:
--
Ti4 inbeats
10
20
30
4B
Tim in beats
5
6
70
'|"
"".
"""
0
I
ss
F3
-
FS
dim-12, nn.8, p-128
-
FC
G1Be
C3
e
D1.
N
0
06
9I
-
dim12, nn14, p.6
-
I
-
F53
DOc3
.
os
E1lg
e
-A
5i
ibet
E1
F.r
asc0,..
o
10
20
40
3
Tim in beats
50
6
U,
'0
4
dim-.12,nn-14, p.16
8
dim=12,nn=14, p-4
dim.12, nn=14, p-12
EB
Fu~~F
-
...
C2
.
C.
.-
7-..
...-
--
s
F5
F5-.
c5
-
4
-
-
.
o
-
E3
..
G4m
-n
.
.L
"et
F3 -et
-i
Ti
1.
-b
A1--
--
"-
.
.
-
w
2
in
ts
~e
...
Appendix THREE
Experiment Forms
149
Experiment 2
Part 1
For each fragment answer the following questions:
1. How complex Is the fragment?
2, How much do youlik this fragment?
3. What group does the figmenth
elonto?
After listening to al the fragments, cluster them into amaximym of five groups based on SIMILARITY. If you
feel the fragments can only be grouped in two, use only numbers Iand 2; if you feel they can only be grouped in
three, use numbers 1,2 and 3; and so on
For both complexity and preference we use a scale from 0 to 6, 0 being the least complex and least
preferred, 6 being the most complex and most preferred.
Please listen to all the musical fragments before answering the questions.
You might want to use pen and paper to take notes ofthe groupings you may come up with before
putting your answers in the form.
complexity [preferenceI group
[F2]
3
tF5]
L-D
f-4
Fc
fA il
,L3
,D4,t-LD.Fcij
FIG
4
How familiarized are you with the music you just heard?
never heard any of it before
sounds familiar
iknow i've heard it before, but don't know what it is
C
iknow exactly what one (some) of them is(are)
O
(Submit
150
150
give name(s)
Answers
Experiment Forms
Experiment Forms
IExperiment
2
Part 2
Rate the SIMILARITY between each of the fragments [Fl], [F2], .., [F10] making up the rows of
the following table, and each of the two pieces labeled [A] and [] making up the columns;
0 being minimum similarity, 6 being maximun similarity.
Please listen to the comparison pieces [A] and [B] carefully before you begin, and feel free to
go back and listen to them as many times as you want.
-F3
[F1]D
F-7
[F3]
[F4]
[F6]
[F8]
How similar are [A] and [B]?
151
152
Experiment Forms
Bibliography
[1] V. Adan. Affine and stochastic transformations in "rotaciones".
2003.
[2] A. Aguilera and R. Pdrez-Aguila. General n-dimensional rotations.
In R. Scopigno and V. Skala, editors, Journal of WSCG, pages 1-8.
International Conference in Central Europe on Computer Graphics,
Visualization and Computer Vision, Science Press, February 2004.
[3] G. Assayag and S. Dubnov. Using factor oracles for machine improvisation. Soft Computing, pages 1-7, 2004.
[4] F. P. Brooks, A. L. Hopkins, P. G. Neumann, and W. V. Wright. An
experiment in musical composition. In S. Schwanauer and D. Levitt,
editors, Machine Models of Music. MIT Press, 1993.
[5] L. Cao. Practical method for determining the minimum embedding
dimension of a scalar time series. Physica D, 1997.
[6] C. Chatfield. The Analysis of Time Series: An Introduction. Text
in Statistical Science Series. Chapman and Hall/CRC Press, 2004.
[7] D. Conklin and I. Witten. Multiple viewpoint systems for music
prediction. In Journal of New Music Research. Swets and Zeitlinger,
1995.
[8] D. Cope. On algorithmic representation of musical style. In M. Balaban, K. Ebcioglu, and 0. Laske, editors, UnderstandingMusic with
AI: Perspectives on music cognition. The AAAI Press/MIT Press,
1992.
[9] D. Cope. A computer model of music composition. In S. Schwanauer
and D. Levitt, editors, Machine Models of Music. MIT Press, 1993.
[10] J. De Bonet and P. Viola. A non-parametric multi-scale statistical
model for natural images. Advances in Neural Information Processing, 10, 1997.
Bibliography
153
[11] M. Dirst and A. Weigend. Baroque forecasting: On completing j.
s. bach's last fugue. In A. Weigend and N. Gershenfeld, editors,
Time Series Prediction: Forecasting the Future and Understanding
the Past, volume XV of Santa Fe Institute Studies in the Sciences
of Complexity, pages 151-172. Addison-Wesley, 1993.
[12] K. Ebcioglu. An expert system for harmonizing chorales in the style
of j.s.bach. In 0. Laske and M. Balaban, editors, UnderstandingMusic with AI: Perspectives on music cognition. The AAAI Press/MIT
Press, 1992.
[13] T. Eerola and P. Toiviainen. MIDI Toolbox: MATLAB Tools for
Music Research. Department of Music, University of Jyviskyli,
Finland, 2004.
[14] J. Estrada. Thiorie de la composition : discontinuum continuum. PhD thesis, l'Universite de Strasbourg II, Sciences Humaines,
France, 1994.
[15] J. Estrada. El imaginario profundo frente a la mdisica como lenguaje.
XXI Coloquio Internacional de Historia del Arte: La abolicicin del
Arte, 2000.
[16] J. Estrada. Focusing on freedom and movement in music: Methods
of transcription inside a continuum of rhythm and sound. In J. R.
Benjamin Boretz, Robert Morris, editor, Perspectives of New Music,
volume 40, pages 70-91. Perspectives of New Music, 2002.
[17] A. Forte. The Structure of Atonal Music. Yale University press,
1973.
[18] N. Gershenfeld. The Nature of Mathematical Modeling. Cambridge
University Press, 1999.
[19] S. Gill. A technique for composition of music in a computer. In
S. Schwanauer and D. Levitt, editors, Machine Models of Music.
MIT Press, 1993.
[20] A. Graps. An introduction to wavelets. IEEE Computational Science and Engineering,2(2), 1995.
[21] L. Hiller and L. Isaacson. Musical composition with a high-speed
digital computer. In S. Schwanauer and D. Levitt, editors, Machine
Models of Music. MIT Press, 1993.
[22] H. Kantz and T. Schreiber. Nonlinear Time Series Analysis. Cambridge University Press, 2004.
154
Bibliography
[23] M. B. Kennel, R. Brown, and H. D. I. Abarbanel. Determining
embedding dimension for phase-space reconstruction using a geometrical construction. Physics Review, 1992.
[24] D. Kugiumtzis, B. Lillekjendlie, and N. Christophersen. Chaotic
time series part i: Estimation of some invariant properties in state
space. Modeling, Identification and Control, 15(4):205-224, 1994.
[25] B. Lillekjendlie, D. Kugiumtzis, and N. Christophersen. Chaotic
time series part ii: System identification and predition. Modeling,
Identification and Control, 15(4):225-243, 1994.
[26] D. Loy. Composing with computers: A survey of some compositional
formalisms and music programming languages. In M. Mathews and
J. Pierce, editors, Current Directions in Computer Music Research.
MIT Press, 1989.
[27] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press,
1998.
[28] E. R. Miranda. Composing music with computers. Music Technology
Series. Focal Press, 2001.
[29] M. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale
processing. In N. Griffith and P. Todd, editors, Musical Networks:
parallel distributed perception and performance. MIT Press, 1999.
[30] F. Pachet. The continuator: Musical interaction with style. In
ICMA, editor, Proceedings of ICMC, pages 211-218. ICMA, 2002.
[31] T. Pankhurst. Schenker guide.
[32] G. Rader. A method of composing simple traditional music by
computer. In S. Schwanauer and D. Levitt, editors, Machine Models
of Music. MIT Press, 1993.
[33] J. Rissanen. Stochastic complexity and modelling., 1986.
[34] R. Rowe. Machine Musicianship. MIT Press, 2001.
[35] R. Shepard. Pitch perception and measurement. In P. Cook, editor,
Music, Cognition and Computerized Sound. MIT Press, 1999.
[36] J. Shlens. A tutorial on principal component analysis. March 2003.
Bibliography
155
[37] B. Shoner. State reconstruction for determining predictability in
driven nonlinear acoustical systems. Master's thesis, MIT, May
1996.
[38] H. Simon and R. Sumner. Pattern in music. In S. M. Schwanauer
and D. A. Levitt, editors, Machine Models of Music. MIT Press,
1993.
[39] L. Smith. The maintainance of uncertainty.
[40] F. Takens. Detecting strange attractors in turbulence. In Lecture
Notes in Mathematica. Springer, 1981.
[41] J. Trevino-Rodriguez and R. Morales-Bueno. Using multiattribute
preduction suffix graphs to predict and generate music. In Computer
Music Journal,number 25:3, pages 62-79. MIT Press, 2001.
[42] B. Vercoe.
Understanding csound's spectral data types.
R. Boulanger, editor, The Csound Book. MIT Press, 2000.
In
[43] I. Xenakis. Formalized Music. Pendragon Press, 1992.
156
156
Bibliography
Bibliography
Download