Melodic Similarity

advertisement
Melodic Similarity
Presenter: Greg Eustace
Overview
•
•
•
•
•
•
Defining melody
Introduction to melodic similarity and its applications
Choosing the level of representation
Weighting
Pre-filtering
Algorithms for measuring melodic similarity
– Sequential representations
– 2-D and geometric representations
Defining melody
• Melody: a monophonic succession of tones characterized by
information pertaining to pitch and rhythm. Such information
contributes to a melody’s perceived shape.
• In polyphonic music the melody represents the dominant tune.
• A melody is an efficient and robust construct.
• Levitin describes a melody as “an auditory object that maintains its
identity under certain transformations... along the six dimensions of
pitch tempo, timbre, loudness, spatial location and reverberant
environment, sometimes with changes in rhythm but rarely with
changes in contour.”
• Melodic similarity is subjective.
Melodic similarity & applications
• Smith, McNab and Witten defined similarity as “the complexity of the
transformation process involved in mapping one object onto the
other”.
• For algorithms which measure melodic similarity, typically a short
query melody is compared with a larger melody (perhaps one of
many in a database). The general problem requires algorithms that
can provide a measure of this complexity, where the minimum value
represents the best match.
• Applications
– Academic research, compositional exploration, memory aid, etc.
– Copyright checks
– Measures of melodic similarity are integral to so called query by
humming algorithms.
Choosing the level of representation
• Absolute pitch
– MIDI note numbers
– Hewlett’s base-40 system, which represents enharmonic spellings of
notes (40 per octave).
• Relative pitch (i.e. intervals)
– Intervallic representation are transposition invariant
– Related representation: modulo interval technique
• Melodic contour
– A note instance is characterized by it’s relationship to the proceeding
note.
– Three common representations include: up, down and same.
– Other representations group ascending and descending intervals into
two or more categories.
– Consistent with perception
Choosing the level of representation
• Other representations are based on I/R (implication/realization)
structures are based on Narmour's theory of perception and
cognition of melodies.
• Some studies consider scalar vs. non-scalar tones
• Rhythmic information is less often considered. Relevant aspects
include note durations, rest positions and time signatures, marked
by stress (e.g. dynamic articulation).
Weighting & pre-filtering
• Weighting:
– Notes may be weighted according to perceptual importance.
– Note duration is the most common measure of importance (e.g. smaller
notes receive less weighting).
– Another example is weighting by stress.
• Pre-filtering:
– Melody extraction techniques are required in order to separate the
melody from all other notes in the piece.
– Standardization may be required to eliminate performance specific
characteristics.
Measuring melodic similarity: Sequence
Matching
• Sequence matching compares sequences of numbers representing
melodies
• Used in speech recognition, biological sequence analysis and text
information retrieval.
• Common problems: Computational inefficiency and perceptual
inconsistency.
Sequence Matching: Edit Distance
• The edit distance (or Levenshtein distance) between two sequences
is defined as “the minimum total cost of transforming one sequence
(the source sequence) into the other (the target) sequence given a
set of allow edit operations and cost function that defines the cost of
each edit operation.”
• Typical edit operations:
– Insertion: involves adding an element to the target sequence.
– Deletion: involve removing an element from the source sequence.
– Replacement: involves replacing an element in the source sequence
with an element from the target sequence.
Sequence Matching: Edit Distance
• Take character strings as a simple example:
cats - mat
• Deleting one letter gives:
cat – mat
• Replacing one letter gives:
mat – mat
• Other (less common) operations:
– Fragmentation: replacing a sequence of notes in the target sequence by
a single note in the source sequence.
– Consolidation: replacing a single note in the target sequence by a
sequence of notes in the source sequence.
Edit distance: algorithms
1. Longest common substring: Melodic similarity is ranked according
to the length of the longest continuous sequence that is identical to
a sequence in the query
2. Longest common subsequence: matching notes may be separated
by any number or size of gaps.
3. N-grams: An n-gram is defined as a sub-sequence of n elements
from a given sequence. A measure of similarity may be determined
by counting the number of matching subsequences of a given
length.
For example, for a query consisting of the sequence of notes:
CDEFGA
There are 3 possible n-grams of length 4 to be identified in the target.
CDEF, DEFG and EFGA
Edit distance: algorithms
4. Dynamic programming: determines the best match between two
sequences on a local basis.
• Given two melodies a matrix can be constructed from the following
formula, where c represents the matrix; q and p represents the query
and target melody sequences respectively; the index i ranges from 0
to the query length and j from 0 to the target length, d is the cost of
an insert or a delete; e is the cost of an exact match and m is the
cost of mismatch (where d = 1, e = 0 and m = 1 in this case):
From chai, 2001
Edit distance: algorithms
• For example, using this approach to match the query: [0 -2 1 2 0]
against the target: [0 0 -2 1 0 2 0] we obtain:
From chai, 2001
• Tracing the local minima results in the following match:
[* 0 0 -2 1 0 2 0]
[* 0 -2 1 2 0]
2-D and Geometric representations
• 2-D representations may reduce a melody to a set of onset-pitch
pairs.
• Earth Mover’s Distance function (EMD): “The EMD measures a
minimum flow for transforming one weighted point set into another
for the purpose of measuring melodic similarity.”
• Proportional Transportation Distance (PTD): A variant of the EMD
function is calculated by first dividing each point’s weight by the
point set’s total weight (for both point sets) and then calculating the
EMD of the result.
2-D and Geometric representations
• Geometric measures of similarity model two melodies as monotonic
pitch-duration functions of time, where the difference between
melody a (Ma) and melody b (Mb) is readily calculated as the
difference in area between the two functions.
• The query melody could be moved relative to the target in order to
minimize the area between the two.
2-D and Geometric representations
• Other algorithms superimpose these functions on cylinders, where
time is measured in terms of the angle theta and pitch is measured
in terms of z.
Download