Melodic Similarity Presenter: Greg Eustace Overview • • • • • • Defining melody Introduction to melodic similarity and its applications Choosing the level of representation Weighting Pre-filtering Algorithms for measuring melodic similarity – Sequential representations – 2-D and geometric representations Defining melody • Melody: a monophonic succession of tones characterized by information pertaining to pitch and rhythm. Such information contributes to a melody’s perceived shape. • In polyphonic music the melody represents the dominant tune. • A melody is an efficient and robust construct. • Levitin describes a melody as “an auditory object that maintains its identity under certain transformations... along the six dimensions of pitch tempo, timbre, loudness, spatial location and reverberant environment, sometimes with changes in rhythm but rarely with changes in contour.” • Melodic similarity is subjective. Melodic similarity & applications • Smith, McNab and Witten defined similarity as “the complexity of the transformation process involved in mapping one object onto the other”. • For algorithms which measure melodic similarity, typically a short query melody is compared with a larger melody (perhaps one of many in a database). The general problem requires algorithms that can provide a measure of this complexity, where the minimum value represents the best match. • Applications – Academic research, compositional exploration, memory aid, etc. – Copyright checks – Measures of melodic similarity are integral to so called query by humming algorithms. Choosing the level of representation • Absolute pitch – MIDI note numbers – Hewlett’s base-40 system, which represents enharmonic spellings of notes (40 per octave). • Relative pitch (i.e. intervals) – Intervallic representation are transposition invariant – Related representation: modulo interval technique • Melodic contour – A note instance is characterized by it’s relationship to the proceeding note. – Three common representations include: up, down and same. – Other representations group ascending and descending intervals into two or more categories. – Consistent with perception Choosing the level of representation • Other representations are based on I/R (implication/realization) structures are based on Narmour's theory of perception and cognition of melodies. • Some studies consider scalar vs. non-scalar tones • Rhythmic information is less often considered. Relevant aspects include note durations, rest positions and time signatures, marked by stress (e.g. dynamic articulation). Weighting & pre-filtering • Weighting: – Notes may be weighted according to perceptual importance. – Note duration is the most common measure of importance (e.g. smaller notes receive less weighting). – Another example is weighting by stress. • Pre-filtering: – Melody extraction techniques are required in order to separate the melody from all other notes in the piece. – Standardization may be required to eliminate performance specific characteristics. Measuring melodic similarity: Sequence Matching • Sequence matching compares sequences of numbers representing melodies • Used in speech recognition, biological sequence analysis and text information retrieval. • Common problems: Computational inefficiency and perceptual inconsistency. Sequence Matching: Edit Distance • The edit distance (or Levenshtein distance) between two sequences is defined as “the minimum total cost of transforming one sequence (the source sequence) into the other (the target) sequence given a set of allow edit operations and cost function that defines the cost of each edit operation.” • Typical edit operations: – Insertion: involves adding an element to the target sequence. – Deletion: involve removing an element from the source sequence. – Replacement: involves replacing an element in the source sequence with an element from the target sequence. Sequence Matching: Edit Distance • Take character strings as a simple example: cats - mat • Deleting one letter gives: cat – mat • Replacing one letter gives: mat – mat • Other (less common) operations: – Fragmentation: replacing a sequence of notes in the target sequence by a single note in the source sequence. – Consolidation: replacing a single note in the target sequence by a sequence of notes in the source sequence. Edit distance: algorithms 1. Longest common substring: Melodic similarity is ranked according to the length of the longest continuous sequence that is identical to a sequence in the query 2. Longest common subsequence: matching notes may be separated by any number or size of gaps. 3. N-grams: An n-gram is defined as a sub-sequence of n elements from a given sequence. A measure of similarity may be determined by counting the number of matching subsequences of a given length. For example, for a query consisting of the sequence of notes: CDEFGA There are 3 possible n-grams of length 4 to be identified in the target. CDEF, DEFG and EFGA Edit distance: algorithms 4. Dynamic programming: determines the best match between two sequences on a local basis. • Given two melodies a matrix can be constructed from the following formula, where c represents the matrix; q and p represents the query and target melody sequences respectively; the index i ranges from 0 to the query length and j from 0 to the target length, d is the cost of an insert or a delete; e is the cost of an exact match and m is the cost of mismatch (where d = 1, e = 0 and m = 1 in this case): From chai, 2001 Edit distance: algorithms • For example, using this approach to match the query: [0 -2 1 2 0] against the target: [0 0 -2 1 0 2 0] we obtain: From chai, 2001 • Tracing the local minima results in the following match: [* 0 0 -2 1 0 2 0] [* 0 -2 1 2 0] 2-D and Geometric representations • 2-D representations may reduce a melody to a set of onset-pitch pairs. • Earth Mover’s Distance function (EMD): “The EMD measures a minimum flow for transforming one weighted point set into another for the purpose of measuring melodic similarity.” • Proportional Transportation Distance (PTD): A variant of the EMD function is calculated by first dividing each point’s weight by the point set’s total weight (for both point sets) and then calculating the EMD of the result. 2-D and Geometric representations • Geometric measures of similarity model two melodies as monotonic pitch-duration functions of time, where the difference between melody a (Ma) and melody b (Mb) is readily calculated as the difference in area between the two functions. • The query melody could be moved relative to the target in order to minimize the area between the two. 2-D and Geometric representations • Other algorithms superimpose these functions on cylinders, where time is measured in terms of the angle theta and pitch is measured in terms of z.