Algorithms for pattern discovery and pitch spelling in music

advertisement
Music Processing Algorithms
David Meredith
Recent projects

Musical pattern matching and discovery
• Finding occurrences of a query pattern in a work
• Finding works that are similar to a query work
• Discovering themes in a work

Pitch spelling
• Predicting the pitch names (e.g., C#4,
B@3) of notes in a “piano-roll” representation
(e.g., MIDI)
• Essential for transcription from MIDI (or audio) to
notation
Algorithms for pattern matching
and pattern discovery in music
Uses of musical pattern
discovery algorithms

In content-based music retrieval
• Creating an index of memorable patterns to enable faster retrieval

For music analysts, performers and listeners
• A motivic/thematic analysis can assist understanding and
appreciation

In transcription
• Helps with inferring beat and metrical structure
• similar patterns have similar metrical structure
• Helps with inferring grouping and phrasing
• “parallellism” (Lerdahl and Jackendoff, 1983) most important factor in
grouping

In composition and improvisation
• Cure composer’s block by suggesting new material based on
patterns discovered in music already written
• Automatically create new music that develops themes discovered
in music already played
• Use analysed thematic structure as a template for a new work
Importance of repeated patterns
in music analysis and cognition

Schenker (1954. p.5):
• repetition “is the basis of music as an art”

Bent and Drabkin (1987, p.5):
• “the central act” in all forms of music analysis is
“the test for identity”

Lerdahl and Jackendoff (1983, p.52):
• “the importance of parallelism [i.e., repetition] in
musical structure cannot be overestimated. The
more parallelism one can detect, the more
internally coherent an analysis becomes, and the
less independent information must be processed
and retained in hearing or remembering a piece”
Most musical repetitions are
neither perceived nor intended
Rachmaninoff, Prelude in C sharp minor, Op.3, No.2, bars 1-6
Interesting musical repetitions
are structurally diverse

Want to discover all and only interesting
repeated patterns
• i.e., themes and motives

Class of interesting repeated patterns is
structurally diverse because
• patterns vary widely in structural characteristics
• many ways of transforming a musical pattern to
give another pattern that is perceived to be a
version of it
• e.g., we can transpose it, embellish it, change tempo
harmony, accompaniment, instrumentation, etc.
Example of repeated motive
Barber, Sonata for Piano, Op.26, 1st mvt, bars 1-4
Example of thematic
transformation
J.S.Bach, Contrapunctus VI from Die Kunst der Fuge, bars 1-5
String-based algorithms for
discovering musical patterns

Most previous approaches assume music
represented as strings
• each string represents a voice or part
• each symbol represents a note or an interval
between two consecutive notes in a voice

Similarity between two patterns measured
in terms of edit distance calculated using
dynamic programming
• see, e.g., Lemstrom (2000), Hsu et al. (1998),
Rolland (1999)
Problems with the string-based
approach - Edit distance
B is an embellished version of A
 If both patterns represented as strings

each symbol represents pitch of note
 then edit distance between A and B is 9


If allow pattern with 9 differences to count
as a match, then get many spurious hits
Problems with string-based
approach - Polyphony

If searching polyphonic music and
• do not know voice to which each note belongs (e.g., MIDI
format 0 file); or
• interested in patterns containing notes from 2 or more voices

then
• combinatorial explosion in number of possible string
representations
• if don’t use all possible representations then may not find all
interesting patterns
Using multidimensional point
sets to represent music (1)
Using multidimensional point
sets to represent music (2)
SIA - Discovering all maximal
translatable patterns (MTPs)
Pattern is translatable by vector v in dataset if it can be translated
by v to give another pattern in the dataset
MTP for a vector v contains all points mapped by v onto other
points in the dataset
O(kn2 log n) time, O(kn2) space
where k is no. of dimensions & n is no. of points
O(kn2) average time with hashing
SIATEC - Discovering all
occurrences of all MTPs
Translational
Equivalence Class
(TEC) is set of all
translationally
invariant
occurrences of a
pattern
Absolute running times of SIA
and SIATEC


SIA and SIATEC implemented in C
run on a 500MHz Sparc on 52 datasets



6≤n≤3456, 2≤k≤5
< 2 mins for SIA to process piece with 3500 notes
13 mins for SIATEC to process piece with 2000 notes
Need for heuristics to isolate
interesting MTPs


2n patterns in a dataset of size n
SIA generates < n2/2 patterns
• => SIA generates small fraction of all patterns in a dataset


Many interesting patterns derivable from
patterns found by SIA
BUT many of the patterns found by SIA are
NOT interesting
• 70,000 patterns found by SIA in Rachmaninoff’s Prelude in
C# minor
• probably about 100 are interesting

=> Need heuristics for isolating interesting
patterns in output of SIA and SIATEC
Heuristics for isolating musical
themes and motives
Cov=6
CR=6/5
Cov=9
CR=9/5
Comp = 1/3
Comp = 2/5
Comp = 2/3
Coverage  Number of points covered by occurrences of the pattern
Compactness =
Number of points in pattern
Number of points in region spanned by pattern
Compression ratio 
Coverage
Number of points in pattern + Freq. of occurrence of pattern -1
COSIATEC - Data compression
using SIATEC
Start
Dataset
SIATEC
List of <Pattern, Translator_set> pairs
Print out best pattern, P, and its translators
Remove occurrences of P from dataset
Is dataset empty?
Yes
End
No
Using COSIATEC for finding
themes and motives in music
First iteration
Second iteration
SIAM - Pattern matching using
SIA
Query pattern
Dataset






k dimensions
n points in dataset
m points in query
O(knm log(nm)) time
O(knm) space
O(knm) average time with
hashing
Improving SIAM - Ukkonen,
Lemström & Mäkinen (2003)




Use sweepline-like scanning of the dataset (Bentley and
Ottmann, 1979)
Generalized to approximate matching of sets of
horizontal line-segments
However, restricted to 2-dimensional representations
(unlike SIA-family)
Improved complexity to



O(mn log m + n log n + m log m) running time (without hashing)
O(m) working space
Implemented as algorithm P2 on C-BRAHMS demo
web site
• <http://www.cs.helsinki.fi/group/cbrahms/demoengine/>
Improving SIAM - MSM
(Clifford et al., 2006)


Finding size of maximal match is 3SUM hard
(i.e., O(n2) )
Reduce problem of multi-dimensional point-set
matching to 1d binary wildcard matching





Random projection to 1D
Length reduction by universal hashing
Binary wildcard matching using FFTs
Find best match and check in O(m) time exactly how
many points match at the location that can be
inferred from this match
Reduces time complexity to O(n log n)
Evaluating MSM:
Precision-Recall




Compared with OMRAS (Pickens et al., 2003)
Test set of 2338 documents, 480 used as queries
All score encodings in strict score time
Queries had notes deleted, transposed and inserted
Evaluating MSM:
Running time



Run on prefixes of various sizes of first movement of
Beethoven’s 3rd Symphony
Each prefix matched against itself
Compared with largest common subset algorithm of
Ukkonen, Lemström and Mäkinen (2003)

MSM nearly 2 orders of magnitude faster (log scale)
Pitch spelling algorithms
Chromatic pitch
A pitch spelling algorithm
takes this...
Time
Diatonic pitch
...and computes this
Time
Why are pitch spelling algorithms
useful?


In transcription, for generating a correctly
notated score from a MIDI or audio file
In content-based music retrieval



For representing better the perceived tonal
relationships between notes
Allows us to find occurrences that sound like the
query but contain different chromatic intervals
For better understanding the cognitive
processes that underlie the perception of tonal
music
Why is the same sound spelt
differently in different contexts?
1
3
2
4
Comparative analysis of pitch
spelling algorithms

Algorithms analysed, evaluated and (in
some cases) improved
•
•
•
•
•

Longuet-Higgins (1976, 1987, 1993)
Cambouropoulos (1996,1998, 2001, 2003)
Temperley (2001)
Chew and Chen (2003, 2005)
Meredith (2003, 2005, 2006)
Test corpus
• 195972 notes, 216 movements, 8 baroque and
classical composers
• almost exactly equal number of notes (24500) for
each composer
Freq
MIDI Note number
The PS13s1 algorithm
Tonic chroma and pitch name class
Time
Initial pitch name class
Ebb Bbb Fb
2
9
4
Cb Gb Db
Ab
Eb
Bb
F
C
G
D
A
E
B
F#
11
8
3
10
5
0
7
2
9
4
11
6
6
1
1
C# G# D#
1
T
T1
T
2
T
T
1
1
T
1
8
3
A#
10
Freq
MIDI Note number
The PS13s1 algorithm
Tonic chroma and pitch name class
Initial pitch name class
Ebb Bbb Fb
2
9
4
Time
Cb Gb Db
Ab
Eb
Bb
F
C
G
D
A
E
B
F#
11
8
3
10
5
0
7
2
9
4
11
6
6
1
C# G# D#
1
T1
T
1
T
T
1
1
T
1
T
2
8
3
A#
10
Evaluation criteria and
performance metrics

Evaluation criteria
• Spelling accuracy - how well an algorithm predicts the pitch
names
• Style dependence - how much spelling accuracy depends
on style

Performance metrics
• Note error rate - proportion of notes in corpus spelt
incorrectly
• Style dependence - standard deviation of note error rates
over 8 composers

Robustness to temporal deviations
• Best versions of algorithms also run on version of test
corpus in which onsets and durations were randomly
adjusted
• To evaluate how well algorithms would work on files
generated directly from performances
Results for algorithms that were
most accurate over clean corpus
Algorithm
Clean corpus
NER%
Noisy corpus
SD NER%
SD
PS13s1x
0.56
0.49
0.61
0.54
Temperley*
0.70
1.13
3.32
3.91
Chew and Chen+
0.85
0.35
0.99
0.55
Cambouropoulos+
0.85
0.47
0.93
0.53
Longuet-Higgins§
1.79
1.79
1.75
1.71
Fixed LOF Range (Eb-G#)
4.38
1.47
4.38
1.47
xK =
pre
10, Kpost= 42
*Two-pass, half tempo corpus, without enh. change (MH2PX2)
+New optimized versions (CamOpt and CCOP01-06)
§Only when music processed a voice at a time (LH1V)
Future work
Further development of SIA family
of algorithms




Compare SIA algorithms with methods
developed in other more mature fields (e.g.,
computer vision, graph matching)
Improve time complexity of SIA algorithms with
techniques such as ones used in MSM
Adapt algorithms for approximate matching and
scaling (matching at different tempi)
Adapt SIA and SIATEC for early pruning of
uninteresting patterns
Further work on PS13s1
Incorporate PS13s1 into complete MIDIto-notation transcription system
 Incorporate PS13s1 into Sibelius notation
software
 Use PS13s1 for key-tracking and
harmonic analysis
 Use PS13s1 for feature extraction on
audio data

Web-scale content-based music
search engine

“Query-by-humming”:


Sound-based browsing


Allows users to browse for music that has similar
tempo, rhythm, harmony, melody, timbre, etc.
Indexing on


Interactive input system that allows user to enter
query multiple times and/or adjust displayed
interpretation
melody, harmony, rhythm, tempo, loudness, timbre,...
Crawling

iTunes, Amazon, MusicBrainz, MySpace, ...
Interactive music systems

Design of interactive musical spaces in which
motion is mapped onto sound in interesting
ways


e.g., experimentation with various different pitch and
timbre spaces
Design of software and hardware that
maximises creativity

e.g. instruments and software that can be used by
both musically trained and untrained to create
interesting music and manipulate it
Transcription

From MIDI to notation

Requires algorithms for
•
•
•
•
•
•
pitch spelling (e.g., PS13s1)
key tracking (also PS13s1?)
metrical structure analysis (finding the beat)
voice analysis
quantization of note duration values
phrase structure analysis
Composition and improvisation


Creating templates for new pieces of music by
analysing the tonal, rhythmic and thematic structures of
corpora of existing works
“Auto-completion” or a “composer’s deblocker”


Intelligent algorithmic improvisation system



Analyses music already written and suggests ways in which it
might be continued
Finds salient themes and motives in music already generated
Creates new music that manipulat these themes in interesting
ways
Algorithm for complete stylistic composition


Input a set of songs or works to be analysed
System composes a new work that has similar structure to
those analysed
Open-source object-oriented music
processing framework



Supports the rapid development and testing of
algorithms for analysing, retrieving, recognizing,
transcribing, composing and performing music.
Classes and interfaces that successfully
encapsulate the properties of a musical
passage, work or collection of works
Supports the operations and transformations
that a musician, musicologist, composer,
listener or performer might want to carry out on
the music.
Systems for corpus-building


Need large, varied and high quality “ground truth” testcorpora for testing algorithms in music analysis and
music information retrieval
Currently such corpora are scarce


Need more effective methods for digitizing musical
resources such as scores and expert analyses and
performances


Serious obstacle to progress in MIR and computational
musicology
Faster and more reliable systems for creating structured digital
encodings from printed scores, recordings
Such systems are of value to libraries and other parties
(e.g., internet search engines) interested in making
such resources available online.
SLUT
Mange tak!
References












Bent, I. and Drabkin, W. (1987) Analysis. Macmillan.
Bentley, J. and Ottmann, T. (1979) "Algorithms for reporting and counting geometric intersections". IEEE
Transactions on Computers, C(28), 643-647.
Clifford, R., Christodoulakis, M., Crawford, T., Meredith, D. and Wiggins, G. A. (2006) "A fast, randomised,
maximal subset matching algorithm for document-level music retrieval". In Proceedings of the 7th International
Conference on Music Information Retrieval (ISMIR 2006), Victoria, Canada.
Hsu, J.-L., Liu, C.-C. and Chen, A. L. B. (1998) "Efficient repeating pattern finding in music databases". In
Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management, pages
281-288.
Lemström, K. (2000) String Matching Techniques for Music Retrieval. PhD dissertation, Department of Computer
Science, University of Helsinki.
Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music. MIT Press, Cambridge MA.
Meredith, D., Lemström, K. and Wiggins, G. A. (2002) "Algorithms for discovering repeated patterns in
multidimensional representations of polyphonic music". Journal of New Music Research, 31(4), 321-345.
Meredith, D. (2006) "Point-set algorithms for pattern discovery and pattern matching in music". In Content-Based
Retrieval, Dagstuhl Seminar Proceedings, 06171.
Pickens, J., Bello, J. P., Monti, G., Sandler, M., Crawford, T., Dovey, M. and Byrd, D. (2003) "Polyphonic score
retrieval using polyphonic audio queries: A harmonic modeling approach". Journal of New Music Research,
32(2), 223-236.
Roland, P.-Y. (1999) "Discovering patterns in musical sequences". Journal of New Music Research, 28(4), 334350.
Schenker, H. (1954) Harmony. University of Chicago Press, London.
Ukkonen, E., Lemström, K. and Mäkinen, V. (2003) "Geometric algorithms for transposition invariant contentbased music retrieval" In Proceedings of the Fourth International Conference on Music Information Retrieval
(ISMIR 2003), Baltimore.
Download